Skip to content

Widhian Bramantya

coding is an art form

Menu
  • About Me
Menu
kafka

Kafka Architecture Explained: Brokers, Topics, Partitions, and Offsets

Posted on September 14, 2025September 14, 2025 by admin

What is an Offset?

Inside every partition, each message has a number called an offset.

  • Offsets are sequential numbers: 0, 1, 2, 3, …
  • They show the position of a message in the partition log.

Offset as a Bookmark

  • Think of an offset as a bookmark in a book.
  • The consumer keeps its own bookmark, not Kafka.
  • This means:
    • Consumer A may stop at offset 15.
    • Consumer B may stop at offset 42.
  • Both can read the same partition independently.

Important: Offsets are per consumer group. This is why multiple teams/apps can consume the same data without disturbing each other.

Why Do We Need Offsets?

  1. Resume After Failure
    • If a consumer crashes and restarts, it knows where to continue (from the last saved offset).
    • This improves availability because no data is lost, and no message is skipped.
  2. Replayability
    • Consumers can reset their offset and re-read old messages.
    • Useful for debugging, re-processing, or training new systems.
  3. Multiple Consumers, Different Needs
    • One consumer group (e.g., analytics) may want to read all data again.
    • Another group (e.g., billing) just continues where it left off.
    • Offsets make this flexibility possible.

Offsets and Data Retention

  • Kafka keeps messages for a certain retention period (e.g., 7 days).
  • If a consumer’s bookmark (offset) points to a message that has already expired, the consumer cannot read it anymore.
  • Longer retention = more storage cost but more flexibility (replay, late consumers).
  • Short retention = less storage cost but less replayability.

Trade-off:

  • Keep data longer if many consumers need history or reprocessing.
  • Keep data shorter if you only need real-time processing and want to save disk space.
See also  Production-Grade Debezium Connector with Kafka (Postgres Outbox Example – E-Commerce Orders)

In short: Offsets give each consumer group a personal bookmark. This enables high availability, replayability, and flexibility — but the value of offsets is always tied to how long Kafka keeps the data (retention policy).

Kafka Lag

Kafka lag is the difference between:

  • the latest offset in a partition (end of log), and
  • the current offset of a consumer group.

Example:

  • Partition has 1,000 messages (latest offset = 999).
  • Consumer has read up to offset 950.
  • Lag = 999 – 950 = 49 messages.

Why Lag Matters

  • High lag = consumer is too slow, falling behind the stream.
  • Low lag = consumer is keeping up with the producer speed.
  • If retention expires before a lagging consumer catches up, it will lose data.
Pages: 1 2 3
Category: Kafka

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Linkedin

Widhian Bramantya

Recent Posts

  • Log Management at Scale: Integrating Elasticsearch with Beats, Logstash, and Kibana
  • Index Lifecycle Management (ILM) in Elasticsearch: Automatic Data Control Made Simple
  • Blue-Green Deployment in Elasticsearch: Safe Reindexing and Zero-Downtime Upgrades
  • Maintaining Super Large Datasets in Elasticsearch
  • Elasticsearch Best Practices for Beginners
  • Implementing the Outbox Pattern with Debezium
  • Production-Grade Debezium Connector with Kafka (Postgres Outbox Example – E-Commerce Orders)
  • Connecting Debezium with Kafka for Real-Time Streaming
  • Debezium Architecture – How It Works and Core Components
  • What is Debezium? – An Introduction to Change Data Capture
  • Offset Management and Consumer Groups in Kafka
  • Partitions, Replication, and Fault Tolerance in Kafka
  • Delivery Semantics in Kafka: At Most Once, At Least Once, Exactly Once
  • Producers and Consumers: How Data Flows in Kafka
  • Kafka Architecture Explained: Brokers, Topics, Partitions, and Offsets
  • Getting Started with Apache Kafka: Core Concepts and Use Cases
  • Security Best Practices for RabbitMQ in Production
  • Understanding RabbitMQ Virtual Hosts (vhosts) and Their Uses
  • RabbitMQ Performance Tuning: Optimizing Throughput and Latency
  • High Availability in RabbitMQ: Clustering and Mirrored Queues Explained

Recent Comments

  1. Playing with VPC AWS (Part 2) – Widhian's Blog on Playing with VPC AWS (Part 1): VPC, Subnet, Internet Gateway, Route Table, NAT, and Security Group
  2. Basic Concept of ElasticSearch (Part 3): Translog, Flush, and Refresh – Widhian's Blog on Basic Concept of ElasticSearch (Part 1): Introduction
  3. Basic Concept of ElasticSearch (Part 2): Architectural Perspective – Widhian's Blog on Basic Concept of ElasticSearch (Part 3): Translog, Flush, and Refresh
  4. Basic Concept of ElasticSearch (Part 3): Translog, Flush, and Refresh – Widhian's Blog on Basic Concept of ElasticSearch (Part 2): Architectural Perspective
  5. Basic Concept of ElasticSearch (Part 1): Introduction – Widhian's Blog on Basic Concept of ElasticSearch (Part 2): Architectural Perspective

Archives

  • October 2025
  • September 2025
  • August 2025
  • November 2021
  • October 2021
  • August 2021
  • July 2021
  • June 2021
  • March 2021
  • January 2021

Categories

  • Debezium
  • Devops
  • ElasticSearch
  • Golang
  • Kafka
  • Lua
  • NATS
  • Programming
  • RabbitMQ
  • Redis
  • VPC
© 2025 Widhian Bramantya | Powered by Minimalist Blog WordPress Theme