Skip to content

Widhian Bramantya

coding is an art form

Menu
  • About Me
Menu
debezium

What is Debezium? – An Introduction to Change Data Capture

Posted on September 22, 2025September 22, 2025 by admin

Introduction

Today, many companies need to move data quickly from one system to another. For example, when a new order is made in an online shop, that order data must be sent to many other systems: payment, inventory, shipping, and reports. Doing this manually is slow and complex. This is where Debezium comes in.

What is Debezium?

Debezium is an open-source tool that helps us watch a database and capture every change inside it. This process is called Change Data Capture (CDC).

In simple words, Debezium can:

  • Detect when new data is added (INSERT).
  • Detect when existing data is changed (UPDATE).
  • Detect when data is removed (DELETE).

Then, Debezium sends these changes to other systems in real time.

What is Change Data Capture (CDC)?

Change Data Capture means listening to a database and recording every change.

  • Example: In a shop database, a new customer buys a product. CDC will capture that new record.
  • If the customer changes their address, CDC will capture the update.
  • If the record is deleted, CDC will capture the delete.

This is very useful for keeping systems synchronized and always up to date.

How Debezium Works

Debezium connects to the transaction log of a database.

  • For MySQL, it reads the binlog.
  • For PostgreSQL, it reads the WAL (Write-Ahead Log).

These logs store every change that happens in the database. Debezium then sends the change events to a system like Apache Kafka, which can deliver them to many applications at once.

Core Flow

Database → Debezium → Kafka → Consumer

flowchart LR
  %% Direction
  direction LR

  %% Sources
  subgraph SRC[Source Databases]
    A1[MySQL binlog]
    A2[PostgreSQL WAL]
    A3[MongoDB oplog]
    A4[SQL Server CDC]
  end

  %% Debezium Connect
  subgraph DBZ[Debezium Connectors]
    C1[MySQL Connector]
    C2[Postgres Connector]
    C3[MongoDB Connector]
    C4[SQL Server Connector]
  end

  %% Kafka
  subgraph KAFKA[Apache Kafka]
    T1[Topic: table1]
    T2[Topic: table2]
  end

  %% Consumers
  subgraph SINKS[Downstream Systems]
    S1[Stream Processor - Flink]
    S2[Search Index - Elasticsearch]
    S3[Data Warehouse - BigQuery]
    S4[Microservices]
    S5[Cache - Redis]
  end

  %% Edges
  A1 --> C1
  A2 --> C2
  A3 --> C3
  A4 --> C4

  C1 --> T1
  C2 --> T1
  C3 --> T2
  C4 --> T2

  T1 --> S1
  T1 --> S2
  T1 --> S3
  T1 --> S4
  T2 --> S5

Source Databases (Left side)

  • These are the original systems where data is stored, such as MySQL, PostgreSQL, MongoDB, or SQL Server.
  • Each database writes changes (insert, update, delete) into its transaction log (e.g., binlog for MySQL, WAL for PostgreSQL).
See also  Ensuring Message Ordering in NATS: A Kafka-like Approach

Debezium Connectors (Middle layer)

  • Debezium has a specific connector for each type of database.
  • The connector reads the database’s transaction log and captures every change event.
  • Example: if a new row is added to a table in MySQL, the MySQL Connector will detect this event.

Apache Kafka (Center)

  • Debezium sends the change events to Kafka topics.
  • Each table usually maps to its own topic (for example: table1 and table2).
  • Kafka acts as a buffer and message broker, allowing many consumers to read the events at their own pace.

Downstream Systems (Right side)

  • These are systems or applications that consume the data changes from Kafka:
    • Stream Processors (e.g., Flink, Kafka Streams) for real-time computation.
    • Search Index (e.g., Elasticsearch) to keep search results up to date.
    • Data Warehouse (e.g., BigQuery, Snowflake) for analytics and reporting.
    • Microservices that need to react to data changes instantly.
    • Caches (e.g., Redis) to maintain fast in-memory views.
Pages: 1 2 3
Category: Debezium

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Linkedin

Widhian Bramantya

Recent Posts

  • Log Management at Scale: Integrating Elasticsearch with Beats, Logstash, and Kibana
  • Index Lifecycle Management (ILM) in Elasticsearch: Automatic Data Control Made Simple
  • Blue-Green Deployment in Elasticsearch: Safe Reindexing and Zero-Downtime Upgrades
  • Maintaining Super Large Datasets in Elasticsearch
  • Elasticsearch Best Practices for Beginners
  • Implementing the Outbox Pattern with Debezium
  • Production-Grade Debezium Connector with Kafka (Postgres Outbox Example – E-Commerce Orders)
  • Connecting Debezium with Kafka for Real-Time Streaming
  • Debezium Architecture – How It Works and Core Components
  • What is Debezium? – An Introduction to Change Data Capture
  • Offset Management and Consumer Groups in Kafka
  • Partitions, Replication, and Fault Tolerance in Kafka
  • Delivery Semantics in Kafka: At Most Once, At Least Once, Exactly Once
  • Producers and Consumers: How Data Flows in Kafka
  • Kafka Architecture Explained: Brokers, Topics, Partitions, and Offsets
  • Getting Started with Apache Kafka: Core Concepts and Use Cases
  • Security Best Practices for RabbitMQ in Production
  • Understanding RabbitMQ Virtual Hosts (vhosts) and Their Uses
  • RabbitMQ Performance Tuning: Optimizing Throughput and Latency
  • High Availability in RabbitMQ: Clustering and Mirrored Queues Explained

Recent Comments

  1. Playing with VPC AWS (Part 2) – Widhian's Blog on Playing with VPC AWS (Part 1): VPC, Subnet, Internet Gateway, Route Table, NAT, and Security Group
  2. Basic Concept of ElasticSearch (Part 3): Translog, Flush, and Refresh – Widhian's Blog on Basic Concept of ElasticSearch (Part 1): Introduction
  3. Basic Concept of ElasticSearch (Part 2): Architectural Perspective – Widhian's Blog on Basic Concept of ElasticSearch (Part 3): Translog, Flush, and Refresh
  4. Basic Concept of ElasticSearch (Part 3): Translog, Flush, and Refresh – Widhian's Blog on Basic Concept of ElasticSearch (Part 2): Architectural Perspective
  5. Basic Concept of ElasticSearch (Part 1): Introduction – Widhian's Blog on Basic Concept of ElasticSearch (Part 2): Architectural Perspective

Archives

  • October 2025
  • September 2025
  • August 2025
  • November 2021
  • October 2021
  • August 2021
  • July 2021
  • June 2021
  • March 2021
  • January 2021

Categories

  • Debezium
  • Devops
  • ElasticSearch
  • Golang
  • Kafka
  • Lua
  • NATS
  • Programming
  • RabbitMQ
  • Redis
  • VPC
© 2025 Widhian Bramantya | Powered by Minimalist Blog WordPress Theme