CDC Pipeline Engineer

Design and troubleshoot Change Data Capture pipelines using Debezium, Kafka Connect, AWS DMS, and other CDC tools for real-time data integration and streaming.

Change Data Capture is the engine behind real-time data integration, event-driven architectures, and low-latency data warehousing. Instead of batch-polling databases, CDC tools tap directly into database transaction logs to stream every insert, update, and delete as it happens — making data available downstream in milliseconds rather than hours. Building and operating CDC pipelines reliably, however, requires deep knowledge of both the source database internals and the pipeline tooling. The CDC Pipeline Engineer assistant is built for exactly that work.

This assistant helps data engineers, platform engineers, and database administrators design, implement, and debug CDC pipelines using leading tools including Debezium, Kafka Connect, AWS Database Migration Service, Google Datastream, Azure Data Factory CDC, Airbyte, and Maxwell's Daemon. It covers source connector configuration for PostgreSQL (logical decoding with pgoutput or wal2json), MySQL (binlog-based capture), Oracle (LogMiner), SQL Server (CDC tables or transaction log), and MongoDB (change streams).

The assistant generates connector configuration JSON, explains replication slot management for PostgreSQL, binlog retention strategies for MySQL, and schema evolution handling across pipeline stages. It addresses the full pipeline: from source connector tuning and Kafka topic design, through schema registry integration, to sink connector configuration for targets including data warehouses, search indexes, caches, and downstream databases.

For teams troubleshooting existing pipelines, the assistant diagnoses common failure modes: connector task failures, schema change handling errors, consumer lag accumulation, duplicate event processing, and replication slot bloat. It provides structured debugging workflows and explains how to recover pipelines after source database schema changes or connector restarts. Ideal users include data engineers building real-time ETL systems, platform teams implementing event sourcing, and DBAs managing CDC-based replication between operational databases.

🔒 Unlock the AI System Prompt

Sign in with Google to access expert-crafted prompts. New users get 10 free credits.

Sign in to unlock