Search Tech Journey

Find topics, journeys and posts

back to blog
data engineeringintermediate 12m2026-06-09

Kafka Part 2 — Replication, ISR, Consumer Groups, Exactly-Once

Session 15 of the 48-session learning series.

Date: Sun, 2026-06-21 · Time: 09:00–11:00 IST · Track: 🗂️ Data Engineering (DE) · Parent 28-day topic: Day 07 · Est. read: 2 h

Why this session matters

This is Session 15 of 48 in the Data Engineering track. It builds on the rhythm of one focused topic, paced so you have time to actually absorb it rather than rush.

Agenda

  • Replication — leader/follower, ISR, high watermark, acks
  • Consumer groups — rebalance, sticky vs cooperative, offset management
  • Delivery semantics — at-most-once, at-least-once, exactly-once
  • Transactions and the idempotent producer — what they actually guarantee
  • Operating Kafka — unclean leader election, min.insync.replicas, retention

Pre-read (skim before the session)

Deep dive

1. Replication primer

Each partition has one leader and zero or more followers (replicas). Writes go to the leader; followers Fetch to copy. The set of followers caught up to the leader is the ISR (In-Sync Replicas).

producer ── write ──▶ leader ── replicate ──▶ follower1
                                  └────────▶ follower2
                                  └────────▶ follower3 (lagging — kicked out of ISR)

A follower stays in ISR as long as it has fetched within replica.lag.time.max.ms (default 30 s). Drop behind and you're out — and writes can succeed without you.

2. The high watermark

The high watermark (HW) is the offset up to which all ISR replicas have replicated. Consumers can only read up to the HW. Why? Because anything past the HW might be lost if the leader dies before followers catch up.

offsets:  0 1 2 3 4 5 6 7 8
leader:   ■ ■ ■ ■ ■ ■ ■ ■ ■
isr1:     ■ ■ ■ ■ ■ ■ ■
isr2:     ■ ■ ■ ■ ■ ■ ■
HW = 7 (consumers can read 0..6)
LEO = 9 (Log End Offset on leader)

3. acks — producer durability knob

acks=0  → fire and forget. Lowest latency, can lose unbounded messages.
acks=1  → leader acks once written to its log. Can lose 1 message on leader failure.
acks=all → leader acks after all ISR replicas have written. No data loss IF
            min.insync.replicas ≥ 2 and ISR ≥ min.insync.replicas.

The gotcha: acks=all alone is not durable. If your ISR shrinks to just the leader (followers all lag), acks=all is effectively acks=1. Always pair with min.insync.replicas=2 (for RF=3) so the leader rejects writes when it can't guarantee.

4. min.insync.replicas — the actual durability knob

RFmin.insync.replicasToleratesNotes
321 broker lossStandard for production
330 broker lossHigher latency, no resilience
312 broker lossBut data can be lost on leader failure

When ISR \< min.insync.replicas, producers get NotEnoughReplicasException and writes pause. That's the desired behaviour — pause rather than silently risk data loss.

5. Unclean leader election

If all ISR replicas die, what happens? Two options:

  • unclean.leader.election.enable=false (default since 2.x): partition unavailable until an ISR replica returns. Data preserved.
  • unclean.leader.election.enable=true: an out-of-sync replica becomes leader. Service restored; data loss (everything past the lagger's offset is gone).

Don't enable unclean election unless you're 100% sure your data can be re-derived.

6. Consumer groups

Multiple consumers in the same group share a topic — each partition assigned to exactly one consumer. Add consumers → throughput scales linearly up to # partitions. Beyond that, extra consumers sit idle.

topic: 6 partitions
group: 3 consumers
→ each consumer gets 2 partitions

group: 6 consumers
→ each consumer gets 1 partition

group: 9 consumers
→ 6 active, 3 idle

7. Rebalance — the operational landmine

When a consumer joins or leaves, Kafka rebalances: all consumers stop, partitions are reassigned. Two modes:

  • Eager (legacy): everyone gives up all partitions, then re-acquires. Full stop-the-world.
  • Cooperative (KIP-429, default since 2.4): only the moving partitions pause. Most consumers keep processing.

Pre-cooperative, a rolling deploy could cause 30+ seconds of cumulative pause. Cooperative cut that to seconds. Always use cooperative now (partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor).

8. Offsets — where you are in the log

Consumers commit their offset to __consumer_offsets (an internal Kafka topic). Two modes:

  • Auto-commit (enable.auto.commit=true, every 5 s): convenient, but commits before you've processed the messages → at-least-once where you might skip on crash. Avoid for non-trivial work.
  • Manual commit (consumer.commitSync() after processing): commit after you've durably written / acked downstream. Standard for production.

9. Delivery semantics

ModeProducerConsumerUse case
At-most-onceacks=0auto-commit before processMetrics, logs (loss OK)
At-least-onceacks=allcommit after processMost pipelines (must be idempotent)
Exactly-onceidempotent producer + transactionsread-process-write in transactionFinancial, audit

10. Idempotent producer (KIP-98)

Setting enable.idempotence=true makes the producer attach a producer ID (PID) and sequence number to each record. The broker dedupes any retried record with the same (PID, seq). This eliminates the classic "retry caused duplicate" bug.

Default since 3.0. Always on. Free correctness improvement.

11. Transactions

producer = KafkaProducer(transactional_id="payments-tx-1", enable_idempotence=True)
producer.init_transactions()

producer.begin_transaction()
producer.send("orders", order_msg)
producer.send("payments", payment_msg)
producer.send_offsets_to_transaction(consumer.position(), consumer_group)
producer.commit_transaction()  # or abort_transaction() on error

Transactions provide atomic multi-partition writes and read-process-write exactly-once when the consumer reads in read_committed mode (skips aborted records). They cost 5–20% throughput. Worth it for billing, ledgers, audit pipelines.

12. Retention

log.retention.hours=168       # 7 days
log.retention.bytes=-1        # unlimited
log.segment.bytes=1073741824  # 1 GB segments
log.cleanup.policy=delete     # or 'compact' for keyed topics

Compacted topics keep only the latest value per key (key=customer_id, value=current profile) — a Kafka topic doubles as a changelog / KV store. State stores in Kafka Streams are built on compacted topics.

13. Operating in production

  • Monitor: under-replicated partitions, ISR shrinks, consumer lag, request latency.
  • Capacity: aim for <70% disk usage; partitions sized so retention fits.
  • Partition count: 100–1000 per broker; too many = high controller overhead, slow leader election.
  • JVM tuning: G1GC, ~25% heap, rest for OS page cache (Kafka leans on page cache hard).
  • Network: 10 Gbps minimum for serious throughput; producers are usually network-bound.

Reading material

In-depth research material

Video reference

▶︎ Confluent — Kafka Internals (Tim Berglund)

Pick a quiet 30 minutes during this session to actually watch it. Don't multitask.

LeetCode — Design Hit Counter

Post-session checklist

By the end of this session you should be able to:

  • Explain the ISR + high-watermark mechanism with a diagram.
  • State the difference between acks=all and acks=all + min.insync.replicas=2.
  • Describe cooperative rebalancing and why it beats eager.
  • Implement at-least-once consumer logic with manual offset commits.
  • Explain what transactions guarantee (and what they don't).
  • Solve design-hit-counter — bucketed time window, same primitive as Kafka's log segment trimming.

Generated from sessions_data.py + content_part*.py. To edit a video / leetcode / title, edit the data file and re-run write_sessions.py.