Skip to content

Schema Changes

How ETL handles DDL and evolving table schemas

ETL supports schema changes, and this area is actively being improved. The current implementation is intentionally conservative: the source-side event trigger captures a rich PostgreSQL-shaped snapshot, while ETL currently models only simple, well-understood column changes: adds, drops, and renames. Built-in destination support varies: BigQuery, ClickHouse, and Snowflake apply these simple changes today; DuckLake does not apply automatic schema-change DDL yet; and Iceberg is deprecated for new deployments.

For published permanent tables, ETL currently models these ALTER TABLE changes:

Source changeETL interpretation
Add a replicated columnAdd column
Drop a replicated columnDrop column
Rename a replicated columnRename column
Several of the above in one statementOne schema snapshot, diffed into adds, drops, and renames

Future behavior will expand from this base. The event trigger already emits more catalog information than ETL consumes today so the system can grow without redesigning the source message format.

ETL installs a PostgreSQL ddl_command_end event trigger named supabase_etl_ddl_message_trigger. When an ALTER TABLE statement affects a published permanent table, the trigger emits a transactional logical message with prefix supabase_etl_ddl.

That message is internal plumbing. Destinations do not receive it directly. Instead, ETL:

  1. Parses the schema-change message.
  2. Stores a new versioned table schema using the message LSN as the schema snapshot id.
  3. Invalidates the in-memory relation state for that table.
  4. Waits for PostgreSQL pgoutput to emit a fresh RELATION message before the next row event for that table.
  5. Sends destinations a public Event::Relation with the new ReplicatedTableSchema.

The important public boundary is:

... -> internal DDL message -> Relation(new schema) -> Insert/Update/Delete ...

Destinations should treat Event::Relation as the point where the active schema changes for following row events.

Relation is an ordered event, not a batch boundary. ETL batches calls to write_events() based on size and time, so a single destination batch may contain zero, one, or many schema changes, including multiple relation events for the same table.

ETL has one shared schema-change signal, but DDL behavior is implemented per destination. A destination may choose to apply DDL automatically, reject a schema change, or require operator handling.

DestinationCurrent DDL behavior
BigQueryStable support for simple add, drop, and rename column changes.
ClickHouseSupports simple add, drop, and rename column changes. ReplacingMergeTree rejects primary-key drops or renames because the ordering expression cannot be rewritten safely.
SnowflakeSupports simple add, drop, and rename column changes.
DuckLakeDoes not apply automatic schema-change DDL yet. Do not rely on Relation events to alter DuckLake tables.
IcebergDeprecated for now. Schema-change DDL is not a supported path for new deployments.
Custom destinationsDestination authors decide which Event::Relation changes to apply, reject, or handle manually.

ETL stores schemas in PostgreSQL column ordinal order (pg_attribute.attnum) and computes destination schema diffs over replicated columns only.

The current diff rules are:

  • Same ordinal position, different name: column rename.
  • Old ordinal position missing from the new schema: column drop.
  • New ordinal position missing from the old schema: column add.

This matches PostgreSQL's normal behavior for simple column operations: renames keep the same attnum, dropped columns disappear from the visible schema, and newly added columns receive new ordinal positions.

Custom destinations should handle schema changes in write_events() by watching for Event::Relation.

A practical flow is:

  1. Iterate through the batch in order, treating each relation event as a possible schema transition for that table.
  2. Flush any buffered rows/events for the old schema before processing the relation event.
  3. Compare the old destination schema with the relation event's new ReplicatedTableSchema.
  4. Mark destination metadata as Applying if the destination needs recovery bookkeeping for the DDL transition.
  5. Apply supported destination DDL for adds, drops, and renames.
  6. Mark destination metadata as Applied only after the destination schema is actually ready for following row events.
  7. Process following row events with the new schema.

The built-in BigQuery, ClickHouse, and Snowflake destinations follow this shape: they mark destination schema metadata as Applying, apply the supported add/rename/drop operations, then mark the schema as Applied. Because destination DDL is not always transactional, a crash while metadata is Applying may require manual intervention.

Other destination modules may support a narrower schema-change surface. Treat Event::Relation as the stable ETL contract, then check the destination's status and implementation before relying on automatic destination DDL.

The source event trigger intentionally observes a broad schema snapshot, but ETL currently supports the simplest safe cases:

  • ALTER TABLE ... ADD COLUMN for replicated columns.
  • ALTER TABLE ... DROP COLUMN for replicated columns.
  • ALTER TABLE ... RENAME COLUMN for replicated columns.
  • Multi-subcommand ALTER TABLE statements composed of those simple changes.
  • Changes to published permanent tables only.

The trigger ignores temporary tables, unpublished tables, generated columns, dropped-column catalog tombstones, extension-owned DDL, and non-logical-WAL databases.

These behaviors are not full destination DDL semantics yet:

  • Only ALTER TABLE is captured by the ETL DDL trigger today.
  • Type changes, nullability changes, default changes, constraint changes, identity changes, and replica-identity changes may be visible in the emitted snapshot, but they are not yet interpreted as destination DDL operations.
  • Table create/drop/rename operations are outside the current schema-change contract. Publication membership changes and table cleanup are handled by other pipeline logic.
  • A drop and re-add is not treated as a rename. It becomes a drop plus an add because PostgreSQL assigns a new ordinal position to the new column.
  • The trigger payload includes current_query for debugging only. It can contain literals and multiple statements, so it must not be treated as replayable DDL.
  • Sessions can set supabase_etl.skip_ddl_log = 'true' as an emergency opt-out while recovering a system. DDL executed with that setting enabled is not logged for ETL.

Future work will implement broader behavior on top of the richer trigger payload, but the current contract is deliberately limited to simple add, drop, and rename handling.

Schema changes are transactional logical messages, so they appear in WAL order relative to row changes. ETL updates its stored schema when it decodes the DDL message, then waits for the next RELATION message to rebuild the runtime replication and identity masks for row decoding.

This matters for custom destinations: row events after a relation event should be decoded and written using that relation event's schema. Row events before it belong to the previous schema.