Configure Postgres for Replication
Set up Postgres with the correct permissions and settings for ETL logical replication
This guide covers the essential Postgres settings, slots, publications, and version-specific features needed for logical replication with ETL.
Using a Supabase-hosted database? Also see the Supabase product guide to database replication. It covers the Supabase-facing replication workflow and is the best companion reference when you are using ETL against a Supabase project.
Prerequisites
Section titled “Prerequisites”- PostgreSQL 14, 15, 16, 17, or 18 (officially supported and tested versions)
- PostgreSQL 15+ recommended for advanced publication filtering (column-level, row-level,
FOR TABLES IN SCHEMA) - PostgreSQL 16+ required when logical replication is read from a physical read replica
- PostgreSQL 14 supported with table-level filtering only
- PostgreSQL 15+ recommended for advanced publication filtering (column-level, row-level,
- Superuser access to the Postgres server
- Ability to restart Postgres (required for
wal_levelchanges)
Enable Logical WAL
Section titled “Enable Logical WAL”Set wal_level = logical to enable Postgres to record logical change data in the WAL, which external tools can then decode and stream.
# postgresql.confwal_level = logicalRestart Postgres after changing this setting.
Replication Slots
Section titled “Replication Slots”Replication slots ensure Postgres retains WAL data for replication consumers, even if they disconnect temporarily. They are:
- Persistent markers that track replication progress
- WAL retention mechanisms that prevent cleanup until consumers catch up
- Consistency guarantees across disconnections
Creating Replication Slots
Section titled “Creating Replication Slots”-- Create a logical replication slotSELECT pg_create_logical_replication_slot('my_slot', 'pgoutput');Viewing Replication Slots
Section titled “Viewing Replication Slots”-- See all replication slotsSELECT slot_name, slot_type, active, restart_lsnFROM pg_replication_slots;Deleting Replication Slots
Section titled “Deleting Replication Slots”-- Drop a replication slot when no longer neededSELECT pg_drop_replication_slot('my_slot');Warning: Only delete slots when you are sure they are not in use. Deleting an active slot will break replication.
Max Replication Slots
Section titled “Max Replication Slots”Controls how many replication slots Postgres can maintain simultaneously.
# postgresql.conf (default is 10)max_replication_slots = 20ETL uses a single replication slot for its main apply worker. Additional slots are created for parallel table copies during initial sync or when new tables are added to the publication. The max_table_sync_workers pipeline parameter controls parallel copies, so total slots used by ETL never exceed max_table_sync_workers + 1.
When to increase:
- Running multiple ETL pipelines against the same database
- Development/testing environments with frequent slot creation
Max WAL Senders
Section titled “Max WAL Senders”Controls the maximum number of concurrent connections for streaming replication. Each replication slot uses one WAL sender connection.
# postgresql.conf (default is 10)max_wal_senders = 20Set this to at least max_replication_slots to ensure all slots can connect.
WAL Keep Size
Section titled “WAL Keep Size”Determines how much WAL data to retain on disk, providing a safety buffer for replication consumers.
# postgresql.confwal_keep_size = 1GBThis setting:
- Prevents WAL deletion when replication consumers fall behind
- Provides recovery time if ETL pipelines temporarily disconnect
- Balances disk usage with replication reliability
WAL Buildup and Disk Usage
Section titled “WAL Buildup and Disk Usage”Replication slots prevent Postgres from deleting WAL files until all consumers have processed them. This can cause significant disk usage if the pipeline falls behind or encounters errors.
Common Causes of WAL Buildup
Section titled “Common Causes of WAL Buildup”1. Tables in Errored State
When a table enters an errored state, ETL keeps its replication slot active to maintain data consistency. This prevents WAL cleanup for that slot, causing Postgres to accumulate WAL files. If you have tables stuck in an errored state:
- Investigate and resolve the error cause
- Remove the table from the publication if no longer needed
- Increase available disk space as a temporary measure
2. Slow Pipeline Performance
If your destination cannot keep up with the rate of changes in Postgres, WAL will accumulate. Common scenarios:
- High destination latency (network or processing)
- Large transactions generating many changes at once
- Destination temporarily unavailable
3. Long-Running Initial Table Copies
During initial sync, ETL creates a replication slot for each table being copied. Large tables with millions of rows can take significant time to copy, during which Postgres continues accumulating WAL.
Warning: If WAL grows beyond the configured limit, Postgres will terminate the replication slot. Control this with max_slot_wal_keep_size:
# postgresql.conf# -1 = unlimited (dangerous for disk space)max_slot_wal_keep_size = 10GBIf a slot is terminated due to exceeding this limit, ETL will restart the table sync from scratch.
Monitoring WAL Usage
Section titled “Monitoring WAL Usage”-- Check replication slot WAL usage.SELECT slot_name, active, wal_status, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS retained_wal_bytes, pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) AS confirmed_flush_lag_bytes, safe_wal_sizeFROM pg_replication_slots;
-- Check total WAL directory sizeSELECT pg_size_pretty(sum(size)) AS wal_sizeFROM pg_ls_waldir();wal_status reports whether the WAL required by a replication slot is still
available:
reserved: required WAL is within normal retention.extended: required WAL exceedsmax_wal_size, but Postgres is still retaining it.unreserved: required WAL is no longer fully reserved and may be removed at the next checkpoint.lost: required WAL was removed and the slot is no longer usable.NULL: the slot has not reserved WAL yet, usually becauserestart_lsnisNULL.
ETL API responses return unknown future Postgres wal_status values as unknown.
Recommendations
Section titled “Recommendations”- Set
max_slot_wal_keep_sizeto a reasonable limit based on available disk space - Monitor
confirmed_flush_lag_bytesfor logical replication lag - Monitor
retained_wal_bytes,safe_wal_size, andwal_statusfor WAL retention risk - Treat
safe_wal_size IS NULLas unlimited slot WAL retention, andsafe_wal_size = 0as no remaining headroom - Alert when slots fall behind acceptable thresholds
- Address errored tables promptly to prevent indefinite WAL accumulation
- Size initial sync workers appropriately (
max_table_sync_workers) to balance parallelism with resource usage
Read Replicas
Section titled “Read Replicas”ETL can read logical replication from a physical read replica when the replica runs PostgreSQL 16 or newer. PostgreSQL 14 and 15 can still be used with ETL, but logical decoding must run on the primary. See the PostgreSQL documentation on logical slots on hot standby.
When using a read replica:
- Configure
pg_connectionto point at the replica. ETL uses this connection for logical replication, table copy, schema reads, publications, slots, keepalives, and status updates. - Configure
store_pg_connectionwhen usingPostgresStoreandpg_connectionpoints at a read-only replica. The store connection must be writable because it runs store migrations and persists pipeline state. - Apply ETL source migrations on the primary before starting the pipeline. Standby connections are read-only, so ETL skips source migration execution when the configured source is in recovery.
- Let ETL create its logical replication slots on the read replica. Do not pre-create ETL logical slots on the primary for this mode.
Publication, table, and ETL source-migration changes are ordinary WAL records. When you create them on the primary, the read replica can only see them after replay reaches that WAL position. Do not wait a fixed number of seconds; wait for a concrete replay LSN:
-- Run on the primary after creating tables, source migrations, and publications.SELECT pg_current_wal_flush_lsn();Then wait on the read replica until replay reaches that LSN. If you just created a new database on the primary, connect to an existing maintenance database such as postgres for this check; the new database might not exist on the replica until replay catches up.
-- Run on the read replica before starting ETL.SELECT pg_last_wal_replay_lsn() >= '0/16B6C50'::pg_lsn AS ready;For extra confidence, also check that the expected publication is visible on the replica:
SELECT 1 FROM pg_publication WHERE pubname = 'my_publication';If ETL only receives the replica pg_connection, it cannot derive the primary's setup LSN itself. The orchestrator that creates or updates the source-side schema and publication should perform this LSN barrier before starting the pipeline, or should retry pipeline startup until the replica catches up. Once the logical slot exists on the replica, ongoing primary writes can lag normally; ETL will decode them as the replica replays WAL and the slot keeps the replica-side restart position.
The primary must generate logical WAL, and each server needs enough sender and slot capacity for the role it plays. On the primary, count the physical slots used by read replicas. On the read replica, count the logical slots ETL creates for its apply worker and table sync workers:
wal_level = logicalmax_replication_slots = 20max_wal_senders = 20For the physical replication link between the primary and the read replica, use a physical replication slot and enable standby feedback:
# on the standbyprimary_conninfo = 'host=primary.example.com port=5432 dbname=postgres user=replicator password=...'primary_slot_name = 'etl_read_replica'hot_standby = onhot_standby_feedback = onwal_receiver_status_interval = '1s'hot_standby_feedback helps prevent required catalog rows from being vacuumed away on the primary while standby logical slots need them. A physical slot between the primary and the standby keeps that protection across standby reconnects and restarts.
If initial copies can run for longer than your standby conflict delay, tune max_standby_streaming_delay for that replica. A larger value reduces copy cancellations at the cost of allowing more replay lag while conflicting standby queries finish.
Logical slot creation on a standby needs information about transactions running on the primary. If the primary is idle, creating a logical slot on the standby can wait until the primary emits that snapshot information. To speed this up during setup or tests, run this on the primary:
SELECT pg_log_standby_snapshot();This is only a setup-time nudge for slot creation. ETL uses the regular PostgreSQL logical replication protocol for keepalives and status updates after streaming starts.
PostgreSQL 17+ also has logical failover slot synchronization, where failover-enabled logical slots on the primary are synchronized to standbys. That is a separate high-availability feature for resuming logical replication after promoting a standby. It is not required for ETL to read from a current read replica, and synchronized standby slots cannot be consumed on the standby while they are marked as synced.
Publications
Section titled “Publications”Publications define which tables and operations to replicate.
Creating Publications
Section titled “Creating Publications”-- Create publication for specific tablesCREATE PUBLICATION my_publication FOR TABLE users, orders;
-- Create publication for all tables (use with caution)CREATE PUBLICATION all_tables FOR ALL TABLES;
-- Create publication for all tables in selected schemasCREATE PUBLICATION schema_tables FOR TABLES IN SCHEMA public, analytics;
-- Include only specific operationsCREATE PUBLICATION inserts_only FOR TABLE users WITH (publish = 'insert');Avoid publishing ETL-owned tables. If the source database is also used as the
ETL state store, the etl schema contains ETL internal tables. Do not include
that schema in the publication. In that setup, FOR ALL TABLES also includes
ETL-owned tables, so use explicit table lists or FOR TABLES IN SCHEMA ... for
customer-owned schemas instead.
Partitioned Tables
Section titled “Partitioned Tables”ETL supports PostgreSQL partition publications with either
publish_via_partition_root = true or publish_via_partition_root = false.
ETL reads the effective table list from pg_publication_tables, so it tracks
the same relation identities and schemas PostgreSQL uses for logical
replication messages.
| Publication shape | publish_via_partition_root | PostgreSQL publishes as | ETL tracks |
|---|---|---|---|
FOR TABLE orders where orders is the top partitioned table | true | orders | orders |
FOR TABLE orders where orders is the top partitioned table | false | Leaf partitions under orders | The leaf partitions |
FOR TABLE orders_2026 where orders_2026 is a partitioned subtree | true | orders_2026 | orders_2026 |
FOR TABLE orders_2026 where orders_2026 is a partitioned subtree | false | Leaf partitions under orders_2026 | The leaf partitions under orders_2026 |
FOR TABLE orders_2026_01 where orders_2026_01 is a leaf partition | Either | orders_2026_01 | orders_2026_01 |
FOR ALL TABLES for the whole database, or FOR TABLES IN SCHEMA ... for selected schemas. Do not include ETL-owned tables. | true | Partition roots plus regular tables | Partition roots plus regular tables |
FOR ALL TABLES for the whole database, or FOR TABLES IN SCHEMA ... for selected schemas. Do not include ETL-owned tables. | false | Leaf partitions plus regular tables | Leaf partitions plus regular tables |
For example, with this hierarchy:
orders ├── orders_2025 │ ├── orders_2025_01 │ └── orders_2025_02 └── orders_2026 ├── orders_2026_01 └── orders_2026_02FOR TABLE orders_2026 WITH (publish_via_partition_root = true) replicates the
2026 subtree as orders_2026. FOR TABLE orders_2026 WITH (publish_via_partition_root = false) replicates orders_2026_01 and
orders_2026_02 as separate leaf tables.
On PostgreSQL 15+, row filters on partition publications are applied during
both the initial copy and CDC. ETL uses the row filter attached to the effective
publication table entry: the published root or subtree when
publish_via_partition_root = true, and the published leaf relation when
publish_via_partition_root = false.
Limitation: With publish_via_partition_root = true, TRUNCATE operations on individual partitions are not replicated. Execute truncates on the published partition table instead. For a top-level publication, that is the top root; for a subtree publication, that is the published subtree root.
-- This will NOT be replicatedTRUNCATE TABLE orders_2026_01;
-- This WILL be replicatedTRUNCATE TABLE orders_2026;Managing Publications
Section titled “Managing Publications”-- View existing publicationsSELECT * FROM pg_publication;
-- See which tables are in a publicationSELECT * FROM pg_publication_tables WHERE pubname = 'my_publication';
-- Add tables to existing publicationALTER PUBLICATION my_publication ADD TABLE products;
-- Remove tables from publicationALTER PUBLICATION my_publication DROP TABLE products;
-- Drop publicationDROP PUBLICATION my_publication;Version-Specific Features
Section titled “Version-Specific Features”ETL supports PostgreSQL 14 through 18, with enhanced publication features available in newer versions:
PostgreSQL 16+ Features
Section titled “PostgreSQL 16+ Features”Logical decoding on read replicas:
PostgreSQL 16 introduced logical replication slots on hot standby servers. Use this when ETL should read WAL from a physical read replica instead of the primary.
PostgreSQL 15+ Features
Section titled “PostgreSQL 15+ Features”Column-Level Filtering:
-- Replicate only specific columns from a tableCREATE PUBLICATION user_basics FOR TABLE users (id, email, created_at);Row-Level Filtering:
-- Replicate only rows that match a conditionCREATE PUBLICATION active_users FOR TABLE users WHERE (status = 'active');Schema-Level Publications:
-- Replicate all tables in a schemaCREATE PUBLICATION schema_pub FOR TABLES IN SCHEMA public;PostgreSQL 14 Limitations
Section titled “PostgreSQL 14 Limitations”PostgreSQL 14 supports table-level publication filtering only. Column-level and row-level filters are not available. Filter data at the application level if selective replication is required.
Feature Compatibility Matrix
Section titled “Feature Compatibility Matrix”| Feature | PostgreSQL 14 | PostgreSQL 15 | PostgreSQL 16+ |
|---|---|---|---|
| Table-level publication | Yes | Yes | Yes |
| Column-level filtering | No | Yes | Yes |
| Row-level filtering | No | Yes | Yes |
FOR TABLES IN SCHEMA | No | Yes | Yes |
| Partitioned table support | Yes | Yes | Yes |
| Logical decoding on physical read replicas | No | No | Yes |
Complete Configuration Example
Section titled “Complete Configuration Example”Minimal postgresql.conf setup:
# Enable logical replicationwal_level = logical
# Replication capacitymax_replication_slots = 20max_wal_senders = 20
# WAL retentionwal_keep_size = 1GB
# Limit WAL retention per slot (optional but recommended)max_slot_wal_keep_size = 10GBAfter editing the configuration:
- Restart Postgres
- Create your publication:
CREATE PUBLICATION etl_publication FOR TABLE your_table;
- Verify the setup:
SHOW wal_level;SHOW max_replication_slots;SELECT * FROM pg_publication WHERE pubname = 'etl_publication';
Next Steps
Section titled “Next Steps”- Your First Pipeline: Hands-on tutorial using these settings
- Custom Stores and Destinations: Build your own components
- ETL Architecture: How ETL uses these settings