Apache Kafka is a distributed event streaming platform, originally developed at LinkedIn and donated to the Apache Software Foundation in 2011. Kafka is neither a traditional message queue nor a database — it belongs to a separate category of systems commonly called a distributed commit log or event log.
The core idea: all events are written into an immutable ordered log, from which consumers read at their own pace without deleting messages.
A logical channel into which producers write messages and consumers read them. A topic is an abstraction; physically, it is split into partitions.
An individual Kafka server. A cluster consists of multiple brokers. Each broker stores part of the partitions from all topics. Brokers do not have a centralized “master” — coordination is handled by KRaft (in modern versions) or ZooKeeper (legacy versions).
Publishes messages to a topic. Decides which partition to write to (or delegates this to Kafka through a partitioner).
Reads messages from a topic. Stores an offset — the position of the last consumed message — either in Kafka (the __consumer_offsets topic) or in external storage.
A monotonically increasing integer identifier of a message within a partition. There is no global topic offset — only per-partition offsets.
A partition is the physical unit of storage and parallelism. Each partition is an ordered, immutable sequence of messages stored on disk as a set of segment files.
A Consumer Group is a logical group of consumers jointly reading one topic. Kafka automatically distributes partitions among group members so that each partition is consumed by exactly one consumer within the group.
When group membership changes, Kafka performs a rebalance.
| Strategy | Description | When to Use |
|---|---|---|
| Range | Assign partitions by ranges | When locality matters |
| RoundRobin | Distribute evenly | Uniform consumers |
| Sticky | Minimize movement | Stateful consumers |
| CooperativeSticky | Incremental rebalance | Recommended in production |
Classic rebalance pauses all consumers.
Cooperative Rebalance (Kafka 2.4+): only moved partitions pause; others continue processing.
A dedicated broker responsible for a consumer group. Stores offsets and coordinates rebalances.
This is one of the most important architectural decisions.
| Factor | Impact |
|---|---|
| Desired throughput | More partitions = more parallelism |
| Number of consumers | Partition count should not be lower |
| Replication factor | More files and storage |
| Rebalance latency | More partitions → slower rebalance |
| Broker memory | Each partition consumes memory |
Rule of thumb: start with max(target_throughput / single_partition_throughput, num_consumers).
Uneven key distribution can overload one partition.
Solutions: - Add random suffixes - Custom partitioners - Pre-aggregation
Kafka retains only the latest message per key.
Messages are stored regardless of whether they were consumed.
Deletion policies:
- Time (retention.ms)
- Size (retention.bytes)
- Combination of both
Sequential disk I/O: append-only writes.
Zero-copy: uses sendfile().
Batching: processes batches instead of individual records.
Page Cache: frequently accessed data stays in RAM.
Compression: batch compression with snappy, lz4, zstd, gzip.
| Mode | Description | Risk |
|---|---|---|
| At most once | No retries | Message loss |
| At least once | Retry on failure | Duplicates |
| Exactly once | Transactions + idempotency | More complexity |
Idempotent Producer (enable.idempotence=true) prevents duplicates.
Transactions allow atomic multi-topic operations.
Kafka Streams — stream processing library inside the Kafka ecosystem.
ksqlDB — SQL interface built on top of Kafka Streams.
{
"title": {
"text": "Comparison of Queueing and Streaming Systems",
"left": "center",
"top": 20,
"textStyle": {
"fontSize": 20,
"color": "#E5E7EB"
}
},
"tooltip": {
"trigger": "axis",
"axisPointer": {
"type": "shadow"
}
},
"legend": {
"bottom": 10,
"textStyle": {
"fontSize": 13,
"color": "#CBD5E1"
},
"data": [
"Throughput",
"Latency (lower = better)",
"Operational Simplicity",
"Ordering"
]
},
"radar": {
"radius": "72%",
"center": ["50%", "45%"],
"name": {
"textStyle": {
"fontSize": 14,
"color": "#D1D5DB"
}
},
"axisLine": {
"lineStyle": {
"color": "#64748B"
}
},
"splitLine": {
"lineStyle": {
"color": "#475569"
}
},
"splitArea": {
"show": true,
"areaStyle": {
"color": [
"rgba(51,65,85,0.10)",
"rgba(51,65,85,0.18)"
]
}
},
"indicator": [
{ "name": "Throughput", "max": 10 },
{ "name": "Low latency", "max": 10 },
{ "name": "Operational simplicity", "max": 10 },
{ "name": "Ordering", "max": 10 },
{ "name": "Replay", "max": 10 },
{ "name": "Ecosystem", "max": 10 }
]
},
"series": [
{
"type": "radar",
"lineStyle": {
"width": 2
},
"data": [
{ "value": [10, 6, 4, 8, 10, 10], "name": "Kafka" },
{ "value": [6, 9, 8, 7, 3, 7], "name": "RabbitMQ" },
{ "value": [9, 5, 6, 8, 10, 6], "name": "Redpanda" },
{ "value": [7, 8, 7, 5, 2, 5], "name": "NATS JetStream" },
{ "value": [8, 7, 5, 6, 8, 4], "name": "Pulsar" }
]
}
]
}
| Characteristic | Kafka | RabbitMQ |
|---|---|---|
| Model | Log-based (pull) | Message broker (push) |
| Storage | Retention-based | Deletes after ACK |
| Replay | Yes | No |
| Throughput | Millions/sec | Tens of thousands/sec |
| Latency | 5–15 ms | <1 ms possible |
| Routing | Topic/partition only | Advanced routing |
| Ordering | Per partition | Best effort |
| Complexity | Higher | Lower |
| Best for | Analytics, event sourcing | Task queues, RPC |
| Characteristic | Kafka | Redpanda |
|---|---|---|
| Language | Java/Scala | C++ |
| Coordination | KRaft | Built-in Raft |
| Latency | 5–15 ms | 1–3 ms |
| Deployment | More complex | Simpler |
| Ecosystem | Huge | Growing |
| Maturity | Battle-tested | Younger |
| Characteristic | Kafka | Pulsar |
|---|---|---|
| Storage Architecture | Broker-based | Compute/storage separation |
| Scaling | Requires rebalance | Seamless |
| Multi-tenancy | Limited | Native |
| Geo-replication | MirrorMaker | Built-in |
| Subscriptions | Consumer Groups | Multiple modes |
| Complexity | High | Very high |
| Characteristic | Kafka | NATS JetStream |
|---|---|---|
| Latency | 5–15 ms | <1 ms |
| Simplicity | Complex | Minimal |
| Throughput | Higher | Lower |
| Ecosystem | Rich | Smaller |
| Storage | Long-term | Less scalable |
✅ Kafka fits when: - Very high throughput - Event sourcing - Multiple independent consumer groups - Real-time analytics - Long-term event storage - Heterogeneous integrations
❌ Kafka may not fit when: - You need latency below 1 ms - Simple task queues are enough - Complex routing is required - Team lacks operational expertise - Message volume is small