Day 07 — Apache Kafka Deep Dive — Partitions, Replication, Consumer Groups, Exactly-Once
Kafka is the universal log of modern data infra. Mastery of partition keys, consumer-group rebalancing, and EOS is the difference between a streaming system tha…
Kafka is a distributed append-only log, partitioned for parallelism and replicated for durability. Almost every advanced behaviour you hit in practice falls out of those two design choices.
🧠 Concept
Why it matters & the mental model.
1. The mental model
A topic is a logical stream, split into N partitions. Each partition is an immutable, ordered log on disk, replicated to R brokers. Producers append by key (key → partition via hash). Consumers in a group divide the partitions among themselves; each consumer commits its own offset.
2. The order guarantee
Kafka guarantees per-partition order, never global. If user_42 must be ordered, key by user_id and every event for that user lands on the same partition. Cross-key order is not guaranteed; if you need it you must collapse to one partition (and lose parallelism).
3. Replication & ISR
Each partition has a leader (handles reads/writes) and R-1 followers. ISR = in-sync replicas, those that have caught up within replica.lag.time.max.ms. Producers can set acks:
acks=0: fire-and-forget. Lose data on broker death.acks=1: leader acked. Lose if leader dies before replication.acks=all: all ISR acked. Survivesmin.insync.replicas - 1failures. Production default:acks=all, min.insync.replicas=2, replication.factor=3.
4. Consumer groups & rebalancing
All consumers in the same group.id share partitions; each partition is read by exactly one consumer in the group. When members come/go, the group coordinator triggers a rebalance:
- Eager rebalance (legacy): everyone stops, partitions reassigned, everyone starts → "stop the world".
- Cooperative incremental rebalance (KIP-429, default 3.x): only impacted partitions move, others keep consuming. Much smoother.
Set
partition.assignment.strategy = CooperativeStickyAssignor.
🛠 Deep Dive
Internals, code, architecture.
5. Offset management — the source of duplicates
Offsets are stored in the internal __consumer_offsets topic. The classic pitfall: auto-commit + manual processing — commit may run before your side-effect completes, so on crash you skip events. Two safe patterns:
- At-least-once + idempotent sink: process, then commit; on crash, replay a few events, downstream handles duplicates (idempotency key, upsert).
- Exactly-once (EOS): producer uses
transactional.id, wrapssend+sendOffsetsToTransactionin a transaction; consumer setsisolation.level=read_committed. End-to-end exactly-once across Kafka→processing→Kafka.
6. Producer internals
- Batching (
linger.ms,batch.size): wait up to linger.ms to accumulate batch.size bytes per partition before sending. Throughput vs latency knob. - Compression (
compression.type=zstd): 3-5× reduction, almost free CPU on modern brokers. - Idempotent producer (
enable.idempotence=true, default in 3.x): each producer gets an ID; broker dedupes on retry within a session. Different from EOS but a prerequisite.
7. Storage & retention
Each partition is a series of segment files (e.g. 1 GB each). Retention can be time-based (retention.ms), size-based (retention.bytes), or log-compacted (keep only latest value per key — great for CDC/state).
8. Performance — sequential IO is king
Kafka's speed comes from:
- Append-only sequential disk writes: SSDs and HDDs both love this.
- Zero-copy via
sendfile: data goes disk → NIC without touching JVM heap. - Page cache: hot data served from RAM by the OS; Kafka itself uses little JVM heap.
🚀 In Practice
Trade-offs, exercises, what to ship today.
9. Common ops nightmares
- Under-replicated partitions: a broker is lagging; check disk, GC, network.
- Consumer lag growing: scale consumers (≤ partitions), check downstream sink, profile UDFs.
- Rebalance storm: misconfigured
session.timeout.ms/max.poll.interval.mscausing consumers to be marked dead → frequent rebalances → no progress. Tune so processing time + buffer <max.poll.interval.ms. - Hot partition: bad key choice (e.g. country_code with US dominating). Re-key or use composite key.
10. Schema discipline
Use a Schema Registry (Confluent / Apicurio) with Avro / Protobuf / JSON Schema. Pick a compatibility mode (BACKWARD is the safe default). This stops the inevitable "producer added a field, consumer broke" outage.
11. Streaming on top
- Kafka Streams (JVM library, exactly-once, stateful joins/aggregations).
- Flink (separate cluster, richer semantics, watermarks, sessions).
- Spark Structured Streaming (micro-batch, easy if you're already on Spark).
- ksqlDB (SQL-on-Kafka for simple transforms).
12. What to take away
"Design a streaming pipeline for X." Strong answers always specify: partition key + rationale, RF + acks, idempotency strategy, lag monitoring, and one recovery scenario. Bonus points for naming min.insync.replicas correctly.
Resources
- 🎥 Confluent — Kafka Internals (Tim Berglund)
- 📖 Kafka: The Definitive Guide (free download)
- 📖 Confluent — Exactly Once Semantics
- 📖 Jay Kreps — The Log: What every software engineer should know
Practice Problem: Sliding Window Maximum (Hard)