Kafka Architecture Explained: Brokers, Topics, Partitions, and Offsets – Page 2

What is an Offset?

Inside every partition, each message has a number called an offset.

Offsets are sequential numbers: 0, 1, 2, 3, …
They show the position of a message in the partition log.

Offset as a Bookmark

Think of an offset as a bookmark in a book.
The consumer keeps its own bookmark, not Kafka.
This means:
- Consumer A may stop at offset 15.
- Consumer B may stop at offset 42.
Both can read the same partition independently.

Important: Offsets are per consumer group. This is why multiple teams/apps can consume the same data without disturbing each other.

Why Do We Need Offsets?

Resume After Failure
- If a consumer crashes and restarts, it knows where to continue (from the last saved offset).
- This improves availability because no data is lost, and no message is skipped.
Replayability
- Consumers can reset their offset and re-read old messages.
- Useful for debugging, re-processing, or training new systems.
Multiple Consumers, Different Needs
- One consumer group (e.g., analytics) may want to read all data again.
- Another group (e.g., billing) just continues where it left off.
- Offsets make this flexibility possible.

Offsets and Data Retention

Kafka keeps messages for a certain retention period (e.g., 7 days).
If a consumer’s bookmark (offset) points to a message that has already expired, the consumer cannot read it anymore.
Longer retention = more storage cost but more flexibility (replay, late consumers).
Short retention = less storage cost but less replayability.

Trade-off:

Keep data longer if many consumers need history or reprocessing.
Keep data shorter if you only need real-time processing and want to save disk space.

In short: Offsets give each consumer group a personal bookmark. This enables high availability, replayability, and flexibility — but the value of offsets is always tied to how long Kafka keeps the data (retention policy).

Kafka Lag

Kafka lag is the difference between:

the latest offset in a partition (end of log), and
the current offset of a consumer group.

Example:

Partition has 1,000 messages (latest offset = 999).
Consumer has read up to offset 950.
Lag = 999 – 950 = 49 messages.

Why Lag Matters

High lag = consumer is too slow, falling behind the stream.
Low lag = consumer is keeping up with the producer speed.
If retention expires before a lagging consumer catches up, it will lose data.

Pages: 1 2 3

Category: Kafka