Build Your First ETL Pipeline
15 minutes: Learn the fundamentals by building a working pipeline.
By the end of this tutorial, you'll have a complete ETL pipeline that streams data changes from Postgres to a memory destination in real-time.
What You'll Build
A real-time data pipeline that:
- Monitors a Postgres table for changes
- Streams INSERT, UPDATE, and DELETE operations
- Stores replicated data in memory for immediate access
Prerequisites
- Rust toolchain (1.75 or later)
- Postgres 14+ with logical replication enabled (
wal_level = logicalinpostgresql.conf) - Basic familiarity with Rust and SQL
New to Postgres logical replication? Read Postgres Replication Concepts first.
Step 1: Create the Project
1 2 | |
Add dependencies to Cargo.toml:
1 2 3 4 | |
Verify: Run cargo check and confirm it compiles without errors.
Step 2: Set Up Postgres
Connect to Postgres and create a test database:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
Verify: SELECT * FROM pg_publication WHERE pubname = 'my_publication'; returns one row.
Step 3: Write the Pipeline
Replace src/main.rs:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | |
Note: Update the password field to match your Postgres credentials.
Step 4: Run the Pipeline
1 | |
You should see the initial table data being copied (the two users from Step 2), then the pipeline continues running, waiting for changes.
Step 5: Test Real-Time Replication
In another terminal, make changes to the database:
1 2 3 4 5 | |
Your pipeline terminal should show these changes being captured in real-time.
Cleanup
Stop the pipeline with Ctrl+C, then clean up the database:
1 2 3 | |
What You Learned
- Publications define which tables to replicate via Postgres logical replication
- Pipeline configuration controls batching behavior and error retry policies
- Memory destinations store data in-memory, useful for testing and development
- The pipeline performs an initial table copy, then streams changes in real-time
Next Steps
- Custom Stores and Destinations: Build your own components
- Configure Postgres: Production Postgres setup
- Architecture: How ETL works internally