One of the most important things to understand in Apache Kafka is how data flows. This happens through two main actors: producers and consumers.
What is a Producer?
A producer is an application that sends messages to Kafka.
- Producers decide which topic the message should go to.
- If a topic has multiple partitions, the producer decides which partition gets the message (either randomly, round-robin, or using a key).
- Producers can also request acknowledgments (acks) from brokers to make sure the message is safely stored.
Example:
- An e-commerce app is a producer that sends an order event into the
orders
topic whenever a customer makes a purchase.
What is a Consumer?
A consumer is an application that reads messages from Kafka.
- Consumers subscribe to one or more topics.
- Messages are delivered in the order they appear inside each partition.
- Consumers keep track of their offset (bookmark) to know where to continue next time.
- Consumers are usually part of a consumer group:
- Each partition is read by only one consumer in the group.
- If more consumers join, the load is balanced.
- If one consumer fails, another takes over.
Example:
- A billing system is a consumer that reads messages from the
orders
topic to generate invoices.
Data Flow in Kafka (Step by Step)
- Producer sends a message: The producer sends an event (e.g.,
Order #123 created
). - Message stored in a topic partition: Kafka broker stores it in the correct topic and partition.
- Message replicated: For fault tolerance, the partition leader copies the message to replicas on other brokers.
- Consumer reads message: A consumer fetches the message from its assigned partition.
- Offset updated: The consumer commits the offset, marking the message as “processed.”
Diagram: Data Flow in Kafka
sequenceDiagram participant P as Producer participant B as Kafka Broker participant T as Topic (orders) with Partitions participant C as Consumer P->>B: Send message (Order #123) B->>T: Store message in partition (e.g., orders-0) T-->>B: Acknowledge write B-->>P: Ack (message stored) C->>B: Fetch from topic (orders-0) B->>C: Deliver message (Order #123) C->>B: Commit offset (mark as processed)
Why This Flow is Powerful
- Producers and consumers are decoupled: They don’t need to know about each other.
- Scalability: Many producers can write at once, and many consumers can read in parallel.
- Reliability: Messages are safely stored in brokers with replication.
- Flexibility: Multiple consumer groups can read the same data for different purposes (analytics, billing, monitoring).
Category: NATS