Project ⏱ 12 min read

14 · Change Data Capture

CDC concepts, source/sink connectors, DB→Kafka→DB pipeline

What is CDC?

Change Data Capture (CDC) captures every INSERT, UPDATE, and DELETE from a database and streams those changes as events. Downstream services subscribe to those events instead of polling the database.

Without CDC:
  Service A writes to DB
  Service B polls DB every 5 seconds — wasteful, delayed

With CDC:
  Service A writes to DB
  CDC captures the change → Kafka topic
  Service B consumes the event immediately — no polling, no delay

CDC is how the LP platform keeps downstream systems (analytics, notifications, loyalty) in sync with coupon and reward store data without tight coupling.

The Pipeline: Source → Kafka → Sink

A CDC pipeline has two halves:

Kafka sits in the middle as the durable buffer. If the sink is down, it resumes from its last committed offset when it restarts.

Source DB (PostgreSQL / MongoDB) INSERT / UPDATE / DELETE → written to WAL / oplog
↓ WAL change record
Debezium Source Connector reads WAL · serializes change event → Avro
↓ Avro-encoded message
Kafka Topic: lp.coupon.user_coupons Avro schema in Schema Registry · offset tracked per consumer group
↓ consumer group pull
JDBC Sink Connector deserializes Avro · executes UPSERT on target DB
↓ upsert / write
Target DB (Analytics DB / Replica) row synced · no polling needed

Avro Schema

Messages are encoded with Avro — a compact binary format with a schema. The schema is registered in Schema Registry and referenced by ID in each message.

// In lp-coupon-api, the Kafka producer serializes with Avro
// github.com/hamba/avro/v2 handles encoding

// Schema example (user_coupon event):
{
  "type": "record",
  "name": "UserCouponEvent",
  "fields": [
    {"name": "user_coupon_id", "type": "string"},
    {"name": "customer_id",    "type": "string"},
    {"name": "status",         "type": "string"},
    {"name": "updated_at",     "type": "long", "logicalType": "timestamp-millis"}
  ]
}

Producing Events — lp-coupon-api

// routes.go — Kafka producer initialized when env vars are set
producer, err := clients.NewKafkaProducer(
  cfg.Kafka.BootstrapServer,
  cfg.Kafka.BrokerCertContent,      // TLS cert
  cfg.Kafka.BrokerClientCertContent,
  cfg.Kafka.BrokerClientKeyContent,
  cfg.Kafka.SchemaRegistryURL,
  cfg.Kafka.UpdateUserCouponTopic,
  log.Zlogger,
)

// services — produce on state change
err := kafkaProducer.Produce(ctx, userCouponID, eventBytes)

If KAFKA_BOOTSTRAP_SERVER is empty (local dev), the producer is nil and Produce is never called — events are silently skipped. This means local dev works without Kafka running.

Failure Modes

The animation below shows at-least-once delivery — the most important failure mode to understand when building consumers.

Simulate at-least-once delivery
Scenario: msg-id abc123 (same row update)
Kafka Topic delivers msg abc123 → Sink Connector receives
Sink executes UPSERT → row written to Target DB
Sink commits Kafka offset ── ack sent to broker
⚠ network hiccup: ack lost, broker never confirmed offset
Kafka redelivers same msg abc123 → Sink receives again
Sink executes UPSERT again → idempotent, no duplicate ✓

Key Takeaways