Introduction
Modern systems need to move data fast and in real time. A small change in a database, like a new order or an update to a customer’s profile, must reach many systems instantly: analytics dashboards, search engines, caches, or other microservices.
Debezium helps solve this challenge. It is an open-source platform for Change Data Capture (CDC). Debezium watches databases, captures every change, and sends the changes to other systems. To understand how it works, let’s look at its architecture.
How Debezium Works in Simple Terms
At its core, Debezium connects to a database transaction log.
- For MySQL, this is called the binlog.
- For PostgreSQL, this is the Write-Ahead Log (WAL).
- For MongoDB, this is the oplog.
These logs contain every change made in the database (insert, update, delete).
Debezium reads from these logs and then sends events to a message broker, usually Apache Kafka. From Kafka, many other systems can subscribe and react to those events.
So the flow looks like this:
Database → Debezium → Kafka → Consumers (apps, services, warehouses, dashboards).
Debezium Architecture Diagram
flowchart LR direction LR subgraph DB[Source Databases] A1[MySQL binlog] A2[PostgreSQL WAL] A3[MongoDB oplog] end subgraph DBZ[Debezium Connectors] C1[MySQL Connector] C2[Postgres Connector] C3[MongoDB Connector] end subgraph KAFKA[Apache Kafka] T1[Topic: orders] T2[Topic: customers] end subgraph SINKS[Consumers] S1[Stream Processing - Flink] S2[Search Engine - Elasticsearch] S3[Data Warehouse - BigQuery] S4[Microservices] end A1 --> C1 A2 --> C2 A3 --> C3 C1 --> T1 C2 --> T1 C3 --> T2 T1 --> S1 T1 --> S2 T1 --> S3 T1 --> S4 T2 --> S4
Core Components of Debezium
Source Databases
These are the systems that hold the original data, such as:
- MySQL
- PostgreSQL
- MongoDB
- SQL Server
- Oracle
Debezium connects to them to capture changes.
Example: Debezium JSON Events from WAL
1. INSERT Event
{ "before": null, "after": { "id": 101, "customer_id": 12, "total": 250.00, "status": "NEW" }, "source": { "version": "2.5.0.Final", "connector": "postgresql", "name": "dbserver1", "ts_ms": 1695804000000, "db": "shopdb", "schema": "public", "table": "orders", "lsn": "0/16B6C50", "txId": 5432 }, "op": "c", "ts_ms": 1695804000100 }
Explanation:
"before": null
→ there was no record before."after": {...}
→ the new row after the insert."op": "c"
→ stands for create (insert).- This event tells consumers that a new order was added to the
orders
table.
2. UPDATE Event
{ "before": { "id": 101, "customer_id": 12, "total": 250.00, "status": "NEW" }, "after": { "id": 101, "customer_id": 12, "total": 250.00, "status": "PAID" }, "source": { "version": "2.5.0.Final", "connector": "postgresql", "name": "dbserver1", "ts_ms": 1695804200000, "db": "shopdb", "schema": "public", "table": "orders", "lsn": "0/16B6E28", "txId": 5433 }, "op": "u", "ts_ms": 1695804200123 }
Explanation:
"before": {...}
→ the row before the update (status = "NEW"
)."after": {...}
→ the row after the update (status = "PAID"
)."op": "u"
→ stands for update.- This event shows that the order’s status changed from NEW to PAID.
3. DELETE Event
{ "before": { "id": 101, "customer_id": 12, "total": 250.00, "status": "PAID" }, "after": null, "source": { "version": "2.5.0.Final", "connector": "postgresql", "name": "dbserver1", "ts_ms": 1695804300000, "db": "shopdb", "schema": "public", "table": "orders", "lsn": "0/16B6F90", "txId": 5434 }, "op": "d", "ts_ms": 1695804300456 }
Explanation:
"before": {...}
→ the row before deletion (the full order record)."after": null
→ the row no longer exists."op": "d"
→ stands for delete.- This event tells consumers that the order was removed from the
orders
table.