14 · Change Data Capture
CDC concepts, source/sink connectors, DB→Kafka→DB pipeline
What is CDC?
Change Data Capture (CDC) captures every INSERT, UPDATE, and DELETE from a database and streams those changes as events. Downstream services subscribe to those events instead of polling the database.
Without CDC:
Service A writes to DB
Service B polls DB every 5 seconds — wasteful, delayed
With CDC:
Service A writes to DB
CDC captures the change → Kafka topic
Service B consumes the event immediately — no polling, no delay
CDC is how the LP platform keeps downstream systems (analytics, notifications, loyalty) in sync with coupon and reward store data without tight coupling.
The Pipeline: Source → Kafka → Sink
A CDC pipeline has two halves:
- Source Connector — watches the source DB (via WAL / oplog) and publishes change events to a Kafka topic. Debezium is the most common source connector.
- Sink Connector — consumes events from the Kafka topic and writes them into the target DB. The JDBC Sink Connector handles SQL targets.
Kafka sits in the middle as the durable buffer. If the sink is down, it resumes from its last committed offset when it restarts.
Avro Schema
Messages are encoded with Avro — a compact binary format with a schema. The schema is registered in Schema Registry and referenced by ID in each message.
// In lp-coupon-api, the Kafka producer serializes with Avro
// github.com/hamba/avro/v2 handles encoding
// Schema example (user_coupon event):
{
"type": "record",
"name": "UserCouponEvent",
"fields": [
{"name": "user_coupon_id", "type": "string"},
{"name": "customer_id", "type": "string"},
{"name": "status", "type": "string"},
{"name": "updated_at", "type": "long", "logicalType": "timestamp-millis"}
]
}
Producing Events — lp-coupon-api
// routes.go — Kafka producer initialized when env vars are set
producer, err := clients.NewKafkaProducer(
cfg.Kafka.BootstrapServer,
cfg.Kafka.BrokerCertContent, // TLS cert
cfg.Kafka.BrokerClientCertContent,
cfg.Kafka.BrokerClientKeyContent,
cfg.Kafka.SchemaRegistryURL,
cfg.Kafka.UpdateUserCouponTopic,
log.Zlogger,
)
// services — produce on state change
err := kafkaProducer.Produce(ctx, userCouponID, eventBytes)
If KAFKA_BOOTSTRAP_SERVER is empty (local dev), the producer is nil and Produce is never called — events are silently skipped. This means local dev works without Kafka running.
Failure Modes
The animation below shows at-least-once delivery — the most important failure mode to understand when building consumers.
- Connector lag: If the connector falls behind, consumers see delayed events. Monitor consumer group lag in the Kafka dashboard.
- Schema evolution: Adding a field to the Avro schema must be backwards-compatible (add with a default value). Removing fields breaks consumers.
- At-least-once delivery: Kafka guarantees at-least-once. Consumers must be idempotent — processing the same event twice should not cause duplicate data.
- Dead letter queue: Messages that fail to process go to a DLQ topic for manual inspection.
Key Takeaways
- CDC has two sides: Source Connector (DB→Kafka) and Sink Connector (Kafka→DB)
- Debezium reads WAL and publishes Avro-encoded events — JDBC Sink upserts them into the target DB
- Kafka is the durable buffer between source and sink — sink resumes from last offset on restart
- Avro schema is in Schema Registry — always add fields with defaults for backwards compat
- Kafka is optional in local dev — producer init is guarded by env var
- Consumers must be idempotent — Kafka delivers at-least-once, not exactly-once