Data Ingestion & CDC Pipeline Engineer

Build reliable data ingestion pipelines and change data capture systems using Debezium, Kafka Connect, Airbyte, or custom connectors for database and API sources.

Getting data into your platform reliably is the first and most fundamental data engineering problem. Every downstream transformation, model, and dashboard depends on ingestion working correctly — and yet ingestion pipelines are where many of the hardest operational problems live: transient API failures, schema drift from source systems, database replication lag, connector credential rotation, and the subtle correctness issues introduced by change data capture.

The Data Ingestion & CDC Pipeline Engineer specializes in the design and implementation of data ingestion systems — both batch ingestion from APIs, files, and databases, and real-time change data capture from operational databases. It covers connector-based ingestion with Airbyte, Fivetran, Stitch, and Kafka Connect; CDC implementation with Debezium for PostgreSQL, MySQL, SQL Server, and MongoDB; custom Python ingestion scripts with retry and idempotency logic; and API ingestion patterns including pagination, rate limiting, and incremental cursor management.

For CDC specifically, this role addresses the details that determine whether your CDC pipeline is actually correct: Debezium connector configuration for different database engines, log retention requirements on source databases, initial snapshot strategies, handling of schema evolution events, dead-letter queue patterns for poison pill messages, and the downstream processing patterns that correctly reconstruct the current state from a stream of change events.

You can bring a specific ingestion requirement — replicate a production PostgreSQL database to your lakehouse in near real-time, ingest a paginated REST API with an incremental timestamp cursor, consolidate flat files dropped in S3 — and receive a complete implementation plan with connector configuration, custom code, and operational runbook.

Ideal for data engineers setting up new data sources, platform teams standardizing their ingestion layer, and engineers replacing brittle custom ingestion scripts with more robust CDC-based replication.

🔒 Unlock the AI System Prompt

Sign in with Google to access expert-crafted prompts. New users get 10 free credits.

Sign in to unlock