Skip to content

Configure Postgres for Replication

Set up Postgres with the correct permissions and settings for ETL logical replication

This guide covers the essential Postgres settings, slots, publications, and version-specific features needed for logical replication with ETL.

Using a Supabase-hosted database? Also see the Supabase product guide to database replication. It covers the Supabase-facing replication workflow and is the best companion reference when you are using ETL against a Supabase project.

  • PostgreSQL 14, 15, 16, 17, or 18 (officially supported and tested versions)
    • PostgreSQL 15+ recommended for advanced publication filtering (column-level, row-level, FOR TABLES IN SCHEMA)
    • PostgreSQL 16+ required when logical replication is read from a physical read replica
    • PostgreSQL 14 supported with table-level filtering only
  • Superuser access to the Postgres server
  • Ability to restart Postgres (required for wal_level changes)

Set wal_level = logical to enable Postgres to record logical change data in the WAL, which external tools can then decode and stream.

# postgresql.conf
wal_level = logical

Restart Postgres after changing this setting.

Replication slots ensure Postgres retains WAL data for replication consumers, even if they disconnect temporarily. They are:

  • Persistent markers that track replication progress
  • WAL retention mechanisms that prevent cleanup until consumers catch up
  • Consistency guarantees across disconnections
-- Create a logical replication slot
SELECT pg_create_logical_replication_slot('my_slot', 'pgoutput');
-- See all replication slots
SELECT slot_name, slot_type, active, restart_lsn
FROM pg_replication_slots;
-- Drop a replication slot when no longer needed
SELECT pg_drop_replication_slot('my_slot');

Warning: Only delete slots when you are sure they are not in use. Deleting an active slot will break replication.

Controls how many replication slots Postgres can maintain simultaneously.

# postgresql.conf (default is 10)
max_replication_slots = 20

ETL uses a single replication slot for its main apply worker. Additional slots are created for parallel table copies during initial sync or when new tables are added to the publication. The max_table_sync_workers pipeline parameter controls parallel copies, so total slots used by ETL never exceed max_table_sync_workers + 1.

When to increase:

  • Running multiple ETL pipelines against the same database
  • Development/testing environments with frequent slot creation

Controls the maximum number of concurrent connections for streaming replication. Each replication slot uses one WAL sender connection.

# postgresql.conf (default is 10)
max_wal_senders = 20

Set this to at least max_replication_slots to ensure all slots can connect.

Determines how much WAL data to retain on disk, providing a safety buffer for replication consumers.

# postgresql.conf
wal_keep_size = 1GB

This setting:

  • Prevents WAL deletion when replication consumers fall behind
  • Provides recovery time if ETL pipelines temporarily disconnect
  • Balances disk usage with replication reliability

Replication slots prevent Postgres from deleting WAL files until all consumers have processed them. This can cause significant disk usage if the pipeline falls behind or encounters errors.

1. Tables in Errored State

When a table enters an errored state, ETL keeps its replication slot active to maintain data consistency. This prevents WAL cleanup for that slot, causing Postgres to accumulate WAL files. If you have tables stuck in an errored state:

  • Investigate and resolve the error cause
  • Remove the table from the publication if no longer needed
  • Increase available disk space as a temporary measure

2. Slow Pipeline Performance

If your destination cannot keep up with the rate of changes in Postgres, WAL will accumulate. Common scenarios:

  • High destination latency (network or processing)
  • Large transactions generating many changes at once
  • Destination temporarily unavailable

3. Long-Running Initial Table Copies

During initial sync, ETL creates a replication slot for each table being copied. Large tables with millions of rows can take significant time to copy, during which Postgres continues accumulating WAL.

Warning: If WAL grows beyond the configured limit, Postgres will terminate the replication slot. Control this with max_slot_wal_keep_size:

# postgresql.conf
# -1 = unlimited (dangerous for disk space)
max_slot_wal_keep_size = 10GB

If a slot is terminated due to exceeding this limit, ETL will restart the table sync from scratch.

-- Check replication slot WAL usage.
SELECT slot_name,
active,
wal_status,
pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS retained_wal_bytes,
pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) AS confirmed_flush_lag_bytes,
safe_wal_size
FROM pg_replication_slots;
-- Check total WAL directory size
SELECT pg_size_pretty(sum(size)) AS wal_size
FROM pg_ls_waldir();

wal_status reports whether the WAL required by a replication slot is still available:

  • reserved: required WAL is within normal retention.
  • extended: required WAL exceeds max_wal_size, but Postgres is still retaining it.
  • unreserved: required WAL is no longer fully reserved and may be removed at the next checkpoint.
  • lost: required WAL was removed and the slot is no longer usable.
  • NULL: the slot has not reserved WAL yet, usually because restart_lsn is NULL.

ETL API responses return unknown future Postgres wal_status values as unknown.

  • Set max_slot_wal_keep_size to a reasonable limit based on available disk space
  • Monitor confirmed_flush_lag_bytes for logical replication lag
  • Monitor retained_wal_bytes, safe_wal_size, and wal_status for WAL retention risk
  • Treat safe_wal_size IS NULL as unlimited slot WAL retention, and safe_wal_size = 0 as no remaining headroom
  • Alert when slots fall behind acceptable thresholds
  • Address errored tables promptly to prevent indefinite WAL accumulation
  • Size initial sync workers appropriately (max_table_sync_workers) to balance parallelism with resource usage

ETL can read logical replication from a physical read replica when the replica runs PostgreSQL 16 or newer. PostgreSQL 14 and 15 can still be used with ETL, but logical decoding must run on the primary. See the PostgreSQL documentation on logical slots on hot standby.

When using a read replica:

  • Configure pg_connection to point at the replica. ETL uses this connection for logical replication, table copy, schema reads, publications, slots, keepalives, and status updates.
  • Configure store_pg_connection when using PostgresStore and pg_connection points at a read-only replica. The store connection must be writable because it runs store migrations and persists pipeline state.
  • Apply ETL source migrations on the primary before starting the pipeline. Standby connections are read-only, so ETL skips source migration execution when the configured source is in recovery.
  • Let ETL create its logical replication slots on the read replica. Do not pre-create ETL logical slots on the primary for this mode.

Publication, table, and ETL source-migration changes are ordinary WAL records. When you create them on the primary, the read replica can only see them after replay reaches that WAL position. Do not wait a fixed number of seconds; wait for a concrete replay LSN:

-- Run on the primary after creating tables, source migrations, and publications.
SELECT pg_current_wal_flush_lsn();

Then wait on the read replica until replay reaches that LSN. If you just created a new database on the primary, connect to an existing maintenance database such as postgres for this check; the new database might not exist on the replica until replay catches up.

-- Run on the read replica before starting ETL.
SELECT pg_last_wal_replay_lsn() >= '0/16B6C50'::pg_lsn AS ready;

For extra confidence, also check that the expected publication is visible on the replica:

SELECT 1 FROM pg_publication WHERE pubname = 'my_publication';

If ETL only receives the replica pg_connection, it cannot derive the primary's setup LSN itself. The orchestrator that creates or updates the source-side schema and publication should perform this LSN barrier before starting the pipeline, or should retry pipeline startup until the replica catches up. Once the logical slot exists on the replica, ongoing primary writes can lag normally; ETL will decode them as the replica replays WAL and the slot keeps the replica-side restart position.

The primary must generate logical WAL, and each server needs enough sender and slot capacity for the role it plays. On the primary, count the physical slots used by read replicas. On the read replica, count the logical slots ETL creates for its apply worker and table sync workers:

wal_level = logical
max_replication_slots = 20
max_wal_senders = 20

For the physical replication link between the primary and the read replica, use a physical replication slot and enable standby feedback:

# on the standby
primary_conninfo = 'host=primary.example.com port=5432 dbname=postgres user=replicator password=...'
primary_slot_name = 'etl_read_replica'
hot_standby = on
hot_standby_feedback = on
wal_receiver_status_interval = '1s'

hot_standby_feedback helps prevent required catalog rows from being vacuumed away on the primary while standby logical slots need them. A physical slot between the primary and the standby keeps that protection across standby reconnects and restarts.

If initial copies can run for longer than your standby conflict delay, tune max_standby_streaming_delay for that replica. A larger value reduces copy cancellations at the cost of allowing more replay lag while conflicting standby queries finish.

Logical slot creation on a standby needs information about transactions running on the primary. If the primary is idle, creating a logical slot on the standby can wait until the primary emits that snapshot information. To speed this up during setup or tests, run this on the primary:

SELECT pg_log_standby_snapshot();

This is only a setup-time nudge for slot creation. ETL uses the regular PostgreSQL logical replication protocol for keepalives and status updates after streaming starts.

PostgreSQL 17+ also has logical failover slot synchronization, where failover-enabled logical slots on the primary are synchronized to standbys. That is a separate high-availability feature for resuming logical replication after promoting a standby. It is not required for ETL to read from a current read replica, and synchronized standby slots cannot be consumed on the standby while they are marked as synced.

Publications define which tables and operations to replicate.

-- Create publication for specific tables
CREATE PUBLICATION my_publication FOR TABLE users, orders;
-- Create publication for all tables (use with caution)
CREATE PUBLICATION all_tables FOR ALL TABLES;
-- Create publication for all tables in selected schemas
CREATE PUBLICATION schema_tables FOR TABLES IN SCHEMA public, analytics;
-- Include only specific operations
CREATE PUBLICATION inserts_only FOR TABLE users WITH (publish = 'insert');

Avoid publishing ETL-owned tables. If the source database is also used as the ETL state store, the etl schema contains ETL internal tables. Do not include that schema in the publication. In that setup, FOR ALL TABLES also includes ETL-owned tables, so use explicit table lists or FOR TABLES IN SCHEMA ... for customer-owned schemas instead.

ETL supports PostgreSQL partition publications with either publish_via_partition_root = true or publish_via_partition_root = false. ETL reads the effective table list from pg_publication_tables, so it tracks the same relation identities and schemas PostgreSQL uses for logical replication messages.

Publication shapepublish_via_partition_rootPostgreSQL publishes asETL tracks
FOR TABLE orders where orders is the top partitioned tabletrueordersorders
FOR TABLE orders where orders is the top partitioned tablefalseLeaf partitions under ordersThe leaf partitions
FOR TABLE orders_2026 where orders_2026 is a partitioned subtreetrueorders_2026orders_2026
FOR TABLE orders_2026 where orders_2026 is a partitioned subtreefalseLeaf partitions under orders_2026The leaf partitions under orders_2026
FOR TABLE orders_2026_01 where orders_2026_01 is a leaf partitionEitherorders_2026_01orders_2026_01
FOR ALL TABLES for the whole database, or FOR TABLES IN SCHEMA ... for selected schemas. Do not include ETL-owned tables.truePartition roots plus regular tablesPartition roots plus regular tables
FOR ALL TABLES for the whole database, or FOR TABLES IN SCHEMA ... for selected schemas. Do not include ETL-owned tables.falseLeaf partitions plus regular tablesLeaf partitions plus regular tables

For example, with this hierarchy:

orders
├── orders_2025
│ ├── orders_2025_01
│ └── orders_2025_02
└── orders_2026
├── orders_2026_01
└── orders_2026_02

FOR TABLE orders_2026 WITH (publish_via_partition_root = true) replicates the 2026 subtree as orders_2026. FOR TABLE orders_2026 WITH (publish_via_partition_root = false) replicates orders_2026_01 and orders_2026_02 as separate leaf tables.

On PostgreSQL 15+, row filters on partition publications are applied during both the initial copy and CDC. ETL uses the row filter attached to the effective publication table entry: the published root or subtree when publish_via_partition_root = true, and the published leaf relation when publish_via_partition_root = false.

Limitation: With publish_via_partition_root = true, TRUNCATE operations on individual partitions are not replicated. Execute truncates on the published partition table instead. For a top-level publication, that is the top root; for a subtree publication, that is the published subtree root.

-- This will NOT be replicated
TRUNCATE TABLE orders_2026_01;
-- This WILL be replicated
TRUNCATE TABLE orders_2026;
-- View existing publications
SELECT * FROM pg_publication;
-- See which tables are in a publication
SELECT * FROM pg_publication_tables WHERE pubname = 'my_publication';
-- Add tables to existing publication
ALTER PUBLICATION my_publication ADD TABLE products;
-- Remove tables from publication
ALTER PUBLICATION my_publication DROP TABLE products;
-- Drop publication
DROP PUBLICATION my_publication;

ETL supports PostgreSQL 14 through 18, with enhanced publication features available in newer versions:

Logical decoding on read replicas:

PostgreSQL 16 introduced logical replication slots on hot standby servers. Use this when ETL should read WAL from a physical read replica instead of the primary.

Column-Level Filtering:

-- Replicate only specific columns from a table
CREATE PUBLICATION user_basics FOR TABLE users (id, email, created_at);

Row-Level Filtering:

-- Replicate only rows that match a condition
CREATE PUBLICATION active_users FOR TABLE users WHERE (status = 'active');

Schema-Level Publications:

-- Replicate all tables in a schema
CREATE PUBLICATION schema_pub FOR TABLES IN SCHEMA public;

PostgreSQL 14 supports table-level publication filtering only. Column-level and row-level filters are not available. Filter data at the application level if selective replication is required.

FeaturePostgreSQL 14PostgreSQL 15PostgreSQL 16+
Table-level publicationYesYesYes
Column-level filteringNoYesYes
Row-level filteringNoYesYes
FOR TABLES IN SCHEMANoYesYes
Partitioned table supportYesYesYes
Logical decoding on physical read replicasNoNoYes

Minimal postgresql.conf setup:

# Enable logical replication
wal_level = logical
# Replication capacity
max_replication_slots = 20
max_wal_senders = 20
# WAL retention
wal_keep_size = 1GB
# Limit WAL retention per slot (optional but recommended)
max_slot_wal_keep_size = 10GB

After editing the configuration:

  1. Restart Postgres
  2. Create your publication:
    CREATE PUBLICATION etl_publication FOR TABLE your_table;
  3. Verify the setup:
    SHOW wal_level;
    SHOW max_replication_slots;
    SELECT * FROM pg_publication WHERE pubname = 'etl_publication';