Schema Changes
How ETL handles DDL and evolving table schemas
ETL supports schema changes, and this area is actively being improved. The current implementation is intentionally conservative: the source-side event trigger captures a rich PostgreSQL-shaped snapshot, while ETL currently models only simple, well-understood column changes: adds, drops, and renames. Built-in destination support varies: BigQuery applies these changes today, DuckLake is in progress, and Iceberg is deprecated for now.
Short Version
For published permanent tables, ETL currently models these ALTER TABLE
changes:
| Source change | ETL interpretation |
|---|---|
| Add a replicated column | Add column |
| Drop a replicated column | Drop column |
| Rename a replicated column | Rename column |
| Several of the above in one statement | One schema snapshot, diffed into adds, drops, and renames |
Future behavior will expand from this base. The event trigger already emits more catalog information than ETL consumes today so the system can grow without redesigning the source message format.
How It Works
ETL installs a PostgreSQL ddl_command_end event trigger named
supabase_etl_ddl_message_trigger. When an ALTER TABLE statement affects a
published permanent table, the trigger emits a transactional logical message
with prefix supabase_etl_ddl.
That message is internal plumbing. Destinations do not receive it directly. Instead, ETL:
- Parses the schema-change message.
- Stores a new versioned table schema using the message LSN as the schema snapshot id.
- Invalidates the in-memory relation state for that table.
- Waits for PostgreSQL pgoutput to emit a fresh
RELATIONmessage before the next row event for that table. - Sends destinations a public
Event::Relationwith the newReplicatedTableSchema.
The important public boundary is:
1 | |
Destinations should treat Event::Relation as the point where the active
schema changes for following row events.
Destination-Specific DDL Behavior
ETL has one shared schema-change signal, but DDL behavior is implemented per destination. A destination may choose to apply DDL automatically, reject a schema change, or require operator handling.
| Destination | Current DDL behavior |
|---|---|
| BigQuery | Stable support for simple add, drop, and rename column changes. |
| DuckLake | In progress. Do not rely on automatic DDL behavior yet. |
| Iceberg | Deprecated for now. Schema-change DDL is not a supported path for new deployments. |
| Custom destinations | Destination authors decide which Event::Relation changes to apply, reject, or handle manually. |
Diff Semantics
ETL stores schemas in PostgreSQL column ordinal order (pg_attribute.attnum)
and computes destination schema diffs over replicated columns only.
The current diff rules are:
- Same ordinal position, different name: column rename.
- Old ordinal position missing from the new schema: column drop.
- New ordinal position missing from the old schema: column add.
This matches PostgreSQL's normal behavior for simple column operations:
renames keep the same attnum, dropped columns disappear from the visible
schema, and newly added columns receive new ordinal positions.
Destination Handling
Custom destinations should handle schema changes in write_events() by
watching for Event::Relation.
A practical flow is:
- Flush any buffered rows/events for the old schema before processing the relation event.
- Compare the old destination schema with the relation event's new
ReplicatedTableSchema. - Apply supported destination DDL for adds, drops, and renames.
- Persist destination metadata only after the destination schema is applied.
- Process following row events with the new schema.
The built-in BigQuery destination follows this shape: it marks destination
schema metadata as Applying, applies add/rename/drop operations, then marks
the schema as Applied. Because BigQuery DDL is not transactional, a crash
while metadata is Applying may require manual intervention.
Other destination modules may support a narrower schema-change surface. Treat
Event::Relation as the stable ETL contract, then check the destination's
status and implementation before relying on automatic destination DDL.
Supported Scope
The source event trigger intentionally observes a broad schema snapshot, but ETL currently supports the simplest safe cases:
ALTER TABLE ... ADD COLUMNfor replicated columns.ALTER TABLE ... DROP COLUMNfor replicated columns.ALTER TABLE ... RENAME COLUMNfor replicated columns.- Multi-subcommand
ALTER TABLEstatements composed of those simple changes. - Changes to published permanent tables only.
The trigger ignores temporary tables, unpublished tables, generated columns, dropped-column catalog tombstones, extension-owned DDL, and non-logical-WAL databases.
Current Limitations
These behaviors are not full destination DDL semantics yet:
- Only
ALTER TABLEis captured by the ETL DDL trigger today. - Type changes, nullability changes, default changes, constraint changes, identity changes, and replica-identity changes may be visible in the emitted snapshot, but they are not yet interpreted as destination DDL operations.
- Table create/drop/rename operations are outside the current schema-change contract. Publication membership changes and table cleanup are handled by other pipeline logic.
- A drop and re-add is not treated as a rename. It becomes a drop plus an add because PostgreSQL assigns a new ordinal position to the new column.
- The trigger payload includes
current_queryfor debugging only. It can contain literals and multiple statements, so it must not be treated as replayable DDL. - Sessions can set
supabase_etl.skip_ddl_log = 'true'as an emergency opt-out while recovering a system. DDL executed with that setting enabled is not logged for ETL.
Future work will implement broader behavior on top of the richer trigger payload, but the current contract is deliberately limited to simple add, drop, and rename handling.
Event Ordering
Schema changes are transactional logical messages, so they appear in WAL order
relative to row changes. ETL updates its stored schema when it decodes the DDL
message, then waits for the next RELATION message to rebuild the runtime
replication and identity masks for row decoding.
This matters for custom destinations: row events after a relation event should be decoded and written using that relation event's schema. Row events before it belong to the previous schema.