System Design Interview Roadmap

System Design Interview Roadmap

Change Data Capture: Streaming Database Changes

From the System Design Interview Roadmap Series - Part II: Data Storage

May 25, 2025
∙ Paid

You're running an e-commerce platform handling millions of transactions daily. Every time a customer places an order, updates their profile, or modifies their cart, your database changes. Now imagine if you could instantly know about every single change as it happens, without constantly polling your database or writing complex triggers. This is the power of Change Data Capture (CDC) – a technique that has quietly become the backbone of modern distributed systems.

What Change Data Capture Really Is

Change Data Capture is like having a vigilant observer sitting beside your database, meticulously recording every insert, update, and delete operation as it occurs. Think of it as a detailed journal that captures not just what changed, but when it changed and often what it looked like before and after the change.

Unlike traditional batch processing where you might export data every few hours, CDC operates in real-time, streaming changes as they happen. It's the difference between receiving a daily newspaper and having a live news ticker – both inform you, but the immediacy changes everything about how you can respond.

The Hidden Mechanics: How CDC Actually Works

Understanding CDC requires diving into the internals of how databases actually store and process changes. Most modern databases maintain what's called a "write-ahead log" (WAL) or "transaction log" – a sequential record of every operation that modifies data. This log serves as the database's memory of what happened and in what order.

CDC systems tap into this transaction log, parsing the low-level entries and translating them into meaningful change events. When you update a customer's email address, the database first writes this change to its log, then applies it to the actual data pages. The CDC system reads this log entry and publishes a structured event containing the old email, new email, timestamp, and metadata about the operation.

This approach is remarkably efficient because it leverages infrastructure the database already maintains for durability and recovery. You're not adding extra load to query the database or maintain shadow tables – you're simply reading the existing audit trail.

User's avatar

Continue reading this post for free, courtesy of System Design Roadmap.

Or purchase a paid subscription.
© 2026 SystemDR LLP · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture