Skip to content

Widhian Bramantya

coding is an art form

Menu
  • About Me
Menu
debezium

Debezium Architecture – How It Works and Core Components

Posted on September 27, 2025September 27, 2025 by admin

Introduction

Modern systems need to move data fast and in real time. A small change in a database, like a new order or an update to a customer’s profile, must reach many systems instantly: analytics dashboards, search engines, caches, or other microservices.

Debezium helps solve this challenge. It is an open-source platform for Change Data Capture (CDC). Debezium watches databases, captures every change, and sends the changes to other systems. To understand how it works, let’s look at its architecture.

How Debezium Works in Simple Terms

At its core, Debezium connects to a database transaction log.

  • For MySQL, this is called the binlog.
  • For PostgreSQL, this is the Write-Ahead Log (WAL).
  • For MongoDB, this is the oplog.

These logs contain every change made in the database (insert, update, delete).
Debezium reads from these logs and then sends events to a message broker, usually Apache Kafka. From Kafka, many other systems can subscribe and react to those events.

So the flow looks like this:

Database → Debezium → Kafka → Consumers (apps, services, warehouses, dashboards).

Debezium Architecture Diagram

flowchart LR
  direction LR

  subgraph DB[Source Databases]
    A1[MySQL binlog]
    A2[PostgreSQL WAL]
    A3[MongoDB oplog]
  end

  subgraph DBZ[Debezium Connectors]
    C1[MySQL Connector]
    C2[Postgres Connector]
    C3[MongoDB Connector]
  end

  subgraph KAFKA[Apache Kafka]
    T1[Topic: orders]
    T2[Topic: customers]
  end

  subgraph SINKS[Consumers]
    S1[Stream Processing - Flink]
    S2[Search Engine - Elasticsearch]
    S3[Data Warehouse - BigQuery]
    S4[Microservices]
  end

  A1 --> C1
  A2 --> C2
  A3 --> C3

  C1 --> T1
  C2 --> T1
  C3 --> T2

  T1 --> S1
  T1 --> S2
  T1 --> S3
  T1 --> S4
  T2 --> S4

Core Components of Debezium

Source Databases

These are the systems that hold the original data, such as:

  • MySQL
  • PostgreSQL
  • MongoDB
  • SQL Server
  • Oracle
See also  What is Debezium? – An Introduction to Change Data Capture

Debezium connects to them to capture changes.

Example: Debezium JSON Events from WAL

1. INSERT Event

{
  "before": null,
  "after": {
    "id": 101,
    "customer_id": 12,
    "total": 250.00,
    "status": "NEW"
  },
  "source": {
    "version": "2.5.0.Final",
    "connector": "postgresql",
    "name": "dbserver1",
    "ts_ms": 1695804000000,
    "db": "shopdb",
    "schema": "public",
    "table": "orders",
    "lsn": "0/16B6C50",
    "txId": 5432
  },
  "op": "c",
  "ts_ms": 1695804000100
}

Explanation:

  • "before": null → there was no record before.
  • "after": {...} → the new row after the insert.
  • "op": "c" → stands for create (insert).
  • This event tells consumers that a new order was added to the orders table.

2. UPDATE Event

{
  "before": {
    "id": 101,
    "customer_id": 12,
    "total": 250.00,
    "status": "NEW"
  },
  "after": {
    "id": 101,
    "customer_id": 12,
    "total": 250.00,
    "status": "PAID"
  },
  "source": {
    "version": "2.5.0.Final",
    "connector": "postgresql",
    "name": "dbserver1",
    "ts_ms": 1695804200000,
    "db": "shopdb",
    "schema": "public",
    "table": "orders",
    "lsn": "0/16B6E28",
    "txId": 5433
  },
  "op": "u",
  "ts_ms": 1695804200123
}

Explanation:

  • "before": {...} → the row before the update (status = "NEW").
  • "after": {...} → the row after the update (status = "PAID").
  • "op": "u" → stands for update.
  • This event shows that the order’s status changed from NEW to PAID.

3. DELETE Event

{
  "before": {
    "id": 101,
    "customer_id": 12,
    "total": 250.00,
    "status": "PAID"
  },
  "after": null,
  "source": {
    "version": "2.5.0.Final",
    "connector": "postgresql",
    "name": "dbserver1",
    "ts_ms": 1695804300000,
    "db": "shopdb",
    "schema": "public",
    "table": "orders",
    "lsn": "0/16B6F90",
    "txId": 5434
  },
  "op": "d",
  "ts_ms": 1695804300456
}

Explanation:

  • "before": {...} → the row before deletion (the full order record).
  • "after": null → the row no longer exists.
  • "op": "d" → stands for delete.
  • This event tells consumers that the order was removed from the orders table.
See also  Connecting Debezium with Kafka for Real-Time Streaming
Pages: 1 2
Category: Debezium

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Linkedin

Widhian Bramantya

Recent Posts

  • Log Management at Scale: Integrating Elasticsearch with Beats, Logstash, and Kibana
  • Index Lifecycle Management (ILM) in Elasticsearch: Automatic Data Control Made Simple
  • Blue-Green Deployment in Elasticsearch: Safe Reindexing and Zero-Downtime Upgrades
  • Maintaining Super Large Datasets in Elasticsearch
  • Elasticsearch Best Practices for Beginners
  • Implementing the Outbox Pattern with Debezium
  • Production-Grade Debezium Connector with Kafka (Postgres Outbox Example – E-Commerce Orders)
  • Connecting Debezium with Kafka for Real-Time Streaming
  • Debezium Architecture – How It Works and Core Components
  • What is Debezium? – An Introduction to Change Data Capture
  • Offset Management and Consumer Groups in Kafka
  • Partitions, Replication, and Fault Tolerance in Kafka
  • Delivery Semantics in Kafka: At Most Once, At Least Once, Exactly Once
  • Producers and Consumers: How Data Flows in Kafka
  • Kafka Architecture Explained: Brokers, Topics, Partitions, and Offsets
  • Getting Started with Apache Kafka: Core Concepts and Use Cases
  • Security Best Practices for RabbitMQ in Production
  • Understanding RabbitMQ Virtual Hosts (vhosts) and Their Uses
  • RabbitMQ Performance Tuning: Optimizing Throughput and Latency
  • High Availability in RabbitMQ: Clustering and Mirrored Queues Explained

Recent Comments

  1. Playing with VPC AWS (Part 2) – Widhian's Blog on Playing with VPC AWS (Part 1): VPC, Subnet, Internet Gateway, Route Table, NAT, and Security Group
  2. Basic Concept of ElasticSearch (Part 3): Translog, Flush, and Refresh – Widhian's Blog on Basic Concept of ElasticSearch (Part 1): Introduction
  3. Basic Concept of ElasticSearch (Part 2): Architectural Perspective – Widhian's Blog on Basic Concept of ElasticSearch (Part 3): Translog, Flush, and Refresh
  4. Basic Concept of ElasticSearch (Part 3): Translog, Flush, and Refresh – Widhian's Blog on Basic Concept of ElasticSearch (Part 2): Architectural Perspective
  5. Basic Concept of ElasticSearch (Part 1): Introduction – Widhian's Blog on Basic Concept of ElasticSearch (Part 2): Architectural Perspective

Archives

  • October 2025
  • September 2025
  • August 2025
  • November 2021
  • October 2021
  • August 2021
  • July 2021
  • June 2021
  • March 2021
  • January 2021

Categories

  • Debezium
  • Devops
  • ElasticSearch
  • Golang
  • Kafka
  • Lua
  • NATS
  • Programming
  • RabbitMQ
  • Redis
  • VPC
© 2025 Widhian Bramantya | Powered by Minimalist Blog WordPress Theme