Skip to content

Widhian Bramantya

coding is an art form

Menu
  • About Me
Menu
kafka

Getting Started with Apache Kafka: Core Concepts and Use Cases

Posted on September 14, 2025September 14, 2025 by admin

Apache Kafka is one of the most popular tools for handling real-time data. Many companies like LinkedIn, Netflix, and Uber use Kafka to process millions of events every second. But what exactly is Kafka, and why is it so powerful? Let’s break it down in simple words.

What is Apache Kafka?

Kafka is a distributed system for handling messages. Think of it as a giant mailbox for data:

  • Applications can send messages (like dropping letters into the mailbox).
  • Other applications can read messages (like checking the mailbox to get letters).

What makes Kafka different from a normal message queue is its speed, scale, and reliability. Kafka is designed to handle huge amounts of data in real-time, while making sure no data is lost.

Core Concepts

To understand Kafka, let’s go step by step:

1. Topic

A topic is like a category or folder for messages. For example:

  • A topic called user-signups can store all events when users register.
  • A topic called orders can store all purchase events.

2. Producer

A producer is an application that sends messages to a topic. Example:

  • An e-commerce app sends order details to the orders topic.

3. Consumer

A consumer is an application that reads messages from a topic. Example:

  • A billing system reads from the orders topic to create invoices.

4. Broker

A broker is a Kafka server that stores messages. Usually, Kafka has many brokers working together in a cluster, so data is safe and can be shared across machines.

5. Partition

Each topic can be split into partitions. This helps Kafka handle more data at the same time.

  • Example: The orders topic has 3 partitions. Messages are split across them, so many consumers can read in parallel.
See also  Connecting Debezium with Kafka for Real-Time Streaming

6. Offset

An offset is a number that shows the position of a message in a partition. It’s like a bookmark, so consumers know where to continue reading.

Component Hierarchy

graph TD
    subgraph Cluster["Kafka Cluster"]
        B1[Broker 1]
        B2[Broker 2]
        B3[Broker 3]
    end

    %% Topics
    T_orders[Topic: orders]
    T_users[Topic: users]

    B1 --> T_orders
    B2 --> T_orders
    B3 --> T_orders
    B1 --> T_users
    B2 --> T_users

    %% Partitions for orders
    subgraph OrdersPartitions["Orders Partitions"]
        O0[orders-0 - leader B1, replicas B1,B2]
        O1[orders-1 - leader B2, replicas B2,B3]
        O2[orders-2 - leader B3, replicas B3,B1]
    end
    T_orders --> OrdersPartitions

    %% Partitions for users
    subgraph UsersPartitions["Users Partitions"]
        U0[users-0 - leader B2, replicas B2,B3]
        U1[users-1 - leader B3, replicas B3,B1]
    end
    T_users --> UsersPartitions

Quick Notes

  • 1 Cluster contains many Brokers.
  • 1 Broker stores many Topics (physically stored as partitions).
  • 1 Topic has multiple Partitions.
  • Each Partition has a Leader and Replicas (for high availability).

Flow Process Diagram

flowchart LR
    subgraph Producers
        P1[Producer A]
        P2[Producer B]
    end

    subgraph KafkaCluster["Kafka Cluster"]
        subgraph TopicOrders["Topic: orders"]
            part0[Partition 0]
            part1[Partition 1]
            part2[Partition 2]
        end
    end

    subgraph Consumers
        subgraph CG1["Consumer Group: billing"]
            C1[Consumer 1]
            C2[Consumer 2]
        end
        subgraph CG2["Consumer Group: analytics"]
            C3[Consumer 1]
        end
    end

    P1 --> TopicOrders
    P2 --> TopicOrders

    part0 --> C1
    part1 --> C2
    part2 --> C1

    part0 -.-> C3
    part1 -.-> C3
    part2 -.-> C3

Why Use Kafka?

Here are some reasons companies use Kafka:

  1. Scalability: Kafka can handle millions of messages per second by spreading data across partitions and brokers.
  2. Durability: Messages are stored safely, even if one server fails.
  3. Real-time Processing: Data can be read and acted on instantly.
  4. Integration: Kafka works well with databases, analytics tools, and microservices.
See also  Partitions, Replication, and Fault Tolerance in Kafka
Pages: 1 2 3
Category: Kafka

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Linkedin

Widhian Bramantya

Recent Posts

  • Log Management at Scale: Integrating Elasticsearch with Beats, Logstash, and Kibana
  • Index Lifecycle Management (ILM) in Elasticsearch: Automatic Data Control Made Simple
  • Blue-Green Deployment in Elasticsearch: Safe Reindexing and Zero-Downtime Upgrades
  • Maintaining Super Large Datasets in Elasticsearch
  • Elasticsearch Best Practices for Beginners
  • Implementing the Outbox Pattern with Debezium
  • Production-Grade Debezium Connector with Kafka (Postgres Outbox Example – E-Commerce Orders)
  • Connecting Debezium with Kafka for Real-Time Streaming
  • Debezium Architecture – How It Works and Core Components
  • What is Debezium? – An Introduction to Change Data Capture
  • Offset Management and Consumer Groups in Kafka
  • Partitions, Replication, and Fault Tolerance in Kafka
  • Delivery Semantics in Kafka: At Most Once, At Least Once, Exactly Once
  • Producers and Consumers: How Data Flows in Kafka
  • Kafka Architecture Explained: Brokers, Topics, Partitions, and Offsets
  • Getting Started with Apache Kafka: Core Concepts and Use Cases
  • Security Best Practices for RabbitMQ in Production
  • Understanding RabbitMQ Virtual Hosts (vhosts) and Their Uses
  • RabbitMQ Performance Tuning: Optimizing Throughput and Latency
  • High Availability in RabbitMQ: Clustering and Mirrored Queues Explained

Recent Comments

  1. Playing with VPC AWS (Part 2) – Widhian's Blog on Playing with VPC AWS (Part 1): VPC, Subnet, Internet Gateway, Route Table, NAT, and Security Group
  2. Basic Concept of ElasticSearch (Part 3): Translog, Flush, and Refresh – Widhian's Blog on Basic Concept of ElasticSearch (Part 1): Introduction
  3. Basic Concept of ElasticSearch (Part 2): Architectural Perspective – Widhian's Blog on Basic Concept of ElasticSearch (Part 3): Translog, Flush, and Refresh
  4. Basic Concept of ElasticSearch (Part 3): Translog, Flush, and Refresh – Widhian's Blog on Basic Concept of ElasticSearch (Part 2): Architectural Perspective
  5. Basic Concept of ElasticSearch (Part 1): Introduction – Widhian's Blog on Basic Concept of ElasticSearch (Part 2): Architectural Perspective

Archives

  • October 2025
  • September 2025
  • August 2025
  • November 2021
  • October 2021
  • August 2021
  • July 2021
  • June 2021
  • March 2021
  • January 2021

Categories

  • Debezium
  • Devops
  • ElasticSearch
  • Golang
  • Kafka
  • Lua
  • NATS
  • Programming
  • RabbitMQ
  • Redis
  • VPC
© 2025 Widhian Bramantya | Powered by Minimalist Blog WordPress Theme