Configure Postgres for Replication
Set up Postgres with the correct permissions and settings for ETL logical replication
This guide covers the essential Postgres concepts and configuration needed for logical replication with ETL.
Prerequisites
- PostgreSQL 14, 15, 16, 17, or 18 (officially supported and tested versions)
- PostgreSQL 15+ recommended for advanced publication filtering (column-level, row-level,
FOR ALL TABLES IN SCHEMA) - PostgreSQL 14 supported with table-level filtering only
- Superuser access to the Postgres server
- Ability to restart Postgres (required for
wal_levelchanges)
Enable Logical WAL
Set wal_level = logical to enable Postgres to record logical change data in the WAL, which external tools can then decode and stream.
1 2 | |
Restart Postgres after changing this setting.
Replication Slots
Replication slots ensure Postgres retains WAL data for replication consumers, even if they disconnect temporarily. They are:
- Persistent markers that track replication progress
- WAL retention mechanisms that prevent cleanup until consumers catch up
- Consistency guarantees across disconnections
Creating Replication Slots
1 2 | |
Viewing Replication Slots
1 2 3 | |
Deleting Replication Slots
1 2 | |
Warning: Only delete slots when you are sure they are not in use. Deleting an active slot will break replication.
Max Replication Slots
Controls how many replication slots Postgres can maintain simultaneously.
1 2 | |
ETL uses a single replication slot for its main apply worker. Additional slots are created for parallel table copies during initial sync or when new tables are added to the publication. The max_table_sync_workers pipeline parameter controls parallel copies, so total slots used by ETL never exceed max_table_sync_workers + 1.
When to increase:
- Running multiple ETL pipelines against the same database
- Development/testing environments with frequent slot creation
Max WAL Senders
Controls the maximum number of concurrent connections for streaming replication. Each replication slot uses one WAL sender connection.
1 2 | |
Set this to at least max_replication_slots to ensure all slots can connect.
WAL Keep Size
Determines how much WAL data to retain on disk, providing a safety buffer for replication consumers.
1 2 | |
This setting:
- Prevents WAL deletion when replication consumers fall behind
- Provides recovery time if ETL pipelines temporarily disconnect
- Balances disk usage with replication reliability
WAL Buildup and Disk Usage
Replication slots prevent Postgres from deleting WAL files until all consumers have processed them. This can cause significant disk usage if the pipeline falls behind or encounters errors.
Common Causes of WAL Buildup
1. Tables in Errored State
When a table enters an errored state, ETL keeps its replication slot active to maintain data consistency. This prevents WAL cleanup for that slot, causing Postgres to accumulate WAL files. If you have tables stuck in an errored state:
- Investigate and resolve the error cause
- Remove the table from the publication if no longer needed
- Increase available disk space as a temporary measure
2. Slow Pipeline Performance
If your destination cannot keep up with the rate of changes in Postgres, WAL will accumulate. Common scenarios:
- High destination latency (network or processing)
- Large transactions generating many changes at once
- Destination temporarily unavailable
3. Long-Running Initial Table Copies
During initial sync, ETL creates a replication slot for each table being copied. Large tables with millions of rows can take significant time to copy, during which Postgres continues accumulating WAL.
Warning: If WAL grows beyond the configured limit, Postgres will terminate the replication slot. Control this with max_slot_wal_keep_size:
1 2 3 | |
If a slot is terminated due to exceeding this limit, ETL will restart the table sync from scratch.
Monitoring WAL Usage
1 2 3 4 5 6 7 8 9 10 | |
Recommendations
- Set
max_slot_wal_keep_sizeto a reasonable limit based on available disk space - Monitor replication slot lag and alert when it exceeds acceptable thresholds
- Address errored tables promptly to prevent indefinite WAL accumulation
- Size initial sync workers appropriately (
max_table_sync_workers) to balance parallelism with resource usage
Publications
Publications define which tables and operations to replicate.
Creating Publications
1 2 3 4 5 6 7 8 | |
Partitioned Tables
To replicate partitioned tables, use publish_via_partition_root = true. This tells Postgres to treat the partitioned table as a single table for replication purposes. All changes to any partition are published as changes to the parent table:
1 2 3 4 5 | |
Limitation: With this option enabled, TRUNCATE operations on individual partitions are not replicated. Execute truncates on the parent table instead:
1 2 3 4 5 | |
Managing Publications
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
Version-Specific Features
ETL supports PostgreSQL versions 14 through 18, with enhanced features available in newer versions:
PostgreSQL 15+ Features
Column-Level Filtering:
1 2 | |
Row-Level Filtering:
1 2 | |
Schema-Level Publications:
1 2 | |
PostgreSQL 14 Limitations
PostgreSQL 14 supports table-level publication filtering only. Column-level and row-level filters are not available. Filter data at the application level if selective replication is required.
Feature Compatibility Matrix
| Feature | PostgreSQL 14 | PostgreSQL 15+ |
|---|---|---|
| Table-level publication | Yes | Yes |
| Column-level filtering | No | Yes |
| Row-level filtering | No | Yes |
FOR ALL TABLES IN SCHEMA |
No | Yes |
| Partitioned table support | Yes | Yes |
Complete Configuration Example
Minimal postgresql.conf setup:
1 2 3 4 5 6 7 8 9 10 11 12 | |
After editing the configuration:
- Restart Postgres
- Create your publication:
1CREATE PUBLICATION etl_publication FOR TABLE your_table; - Verify the setup:
1 2 3
SHOW wal_level; SHOW max_replication_slots; SELECT * FROM pg_publication WHERE pubname = 'etl_publication';
Next Steps
- Your First Pipeline: Hands-on tutorial using these settings
- Custom Stores and Destinations: Build your own components
- ETL Architecture: How ETL uses these settings