Build Your First ETL Pipeline

15 minutes: Learn the fundamentals by building a working pipeline.

By the end of this tutorial, you'll have a complete ETL pipeline that streams data changes from Postgres to a memory destination in real-time.

What You'll Build

A real-time data pipeline that:

Monitors a Postgres table for changes
Streams INSERT, UPDATE, and DELETE operations
Stores replicated data in memory for immediate access

Prerequisites

Rust toolchain (1.75 or later)
Postgres 14+ with logical replication enabled (wal_level = logical in postgresql.conf)
Basic familiarity with Rust and SQL

New to Postgres logical replication? Read Postgres Replication Concepts first.

Step 1: Create the Project

cargo new etl-tutorial
cd etl-tutorial

Add dependencies to Cargo.toml:

[dependencies]
etl = { git = "https://github.com/supabase/etl" }
tokio = { version = "1", features = ["full"] }
tracing-subscriber = { version = "0.3", features = ["env-filter"] }

Verify: Run cargo check and confirm it compiles without errors.

Step 2: Set Up Postgres

Connect to Postgres and create a test database:

CREATE DATABASE etl_tutorial;
\c etl_tutorial

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    email TEXT UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT NOW()
);

INSERT INTO users (name, email) VALUES
    ('Alice Johnson', 'alice@example.com'),
    ('Bob Smith', 'bob@example.com');

CREATE PUBLICATION my_publication FOR TABLE users;

Verify: SELECT * FROM pg_publication WHERE pubname = 'my_publication'; returns one row.

Step 3: Write the Pipeline

Replace src/main.rs:

use etl::config::{BatchConfig, PgConnectionConfig, PipelineConfig, TlsConfig};
use etl::pipeline::Pipeline;
use etl::store::both::memory::MemoryStore;
use etl_destinations::bigquery::BigQueryDestination;
use std::error::Error;

#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
    let pg_config = PgConnectionConfig {
        host: "localhost".to_string(),
        port: 5432,
        name: "etl_tutorial".to_string(),
        username: "postgres".to_string(),
        password: Some("your_password".to_string().into()),  // Update this
        tls: TlsConfig {
            enabled: false,
            trusted_root_certs: String::new(),
        },
        keepalive: None,
    };

    let config = PipelineConfig {
        id: 1,
        publication_name: "my_publication".to_string(),
        pg_connection: pg_config,
        batch: BatchConfig {
            max_size: 1000,
            max_fill_ms: 5000,
        },
        table_error_retry_delay_ms: 10000,
        table_error_retry_max_attempts: 5,
        max_table_sync_workers: 4,
    };

    let store = MemoryStore::new();
    let destination = BigQueryDestination::new_with_key_path(
        "my-gcp-project".into(),
        "my_dataset".into(),
        "/path/to/service-account-key.json",
        None,
        1,
        1,
        store.clone(),
    )
    .await?;

    println!("Starting pipeline...");
    let mut pipeline = Pipeline::new(config, store, destination);
    pipeline.start().await?;
    pipeline.wait().await?;

    Ok(())
}

Note: Update the password field to match your Postgres credentials.

Step 4: Run the Pipeline

RUST_LOG=info cargo run

You should see the initial table data being copied (the two users from Step 2), then the pipeline continues running, waiting for changes.

Step 5: Test Real-Time Replication

In another terminal, make changes to the database:

\c etl_tutorial

INSERT INTO users (name, email) VALUES ('Charlie Brown', 'charlie@example.com');
UPDATE users SET name = 'Alice Cooper' WHERE email = 'alice@example.com';
DELETE FROM users WHERE email = 'bob@example.com';

Your pipeline terminal should show these changes being captured in real-time.

Cleanup

Stop the pipeline with Ctrl+C, then clean up the database:

-- Connect to a different database first (e.g., postgres)
\c postgres
DROP DATABASE etl_tutorial;

What You Learned

Publications define which tables to replicate via Postgres logical replication
Pipeline configuration controls batching behavior and error retry policies
Memory destinations store data in-memory, useful for testing and development
The pipeline performs an initial table copy, then streams changes in real-time

Next Steps

Custom Stores and Destinations: Build your own components
Configure Postgres: Production Postgres setup
Architecture: How ETL works internally