data engineeringbeginner 5m2026-05-24

Kafka 101 for ML engineers

Topics, partitions, consumer groups — the parts of Kafka that actually matter when you put ML features behind it.

ML systems don't fail because the model is bad. They fail because the feature pipeline upstream is bad. Kafka is the single most common piece of that pipeline. Here's the part you actually need to know.

The mental model

Kafka is a durable, partitioned, replayable log. That's the whole product.

A topic is a named log.
A topic is split into partitions, each an ordered append-only file.
Producers write to a partition (chosen by key hash).
Consumers read sequentially and commit their offset back to Kafka.

flowchart LR
  P1[Producer A] --> T1[(Topic: user-events)]
  P2[Producer B] --> T1
  T1 --> CG1[Consumer Group: feature-store]
  T1 --> CG2[Consumer Group: realtime-ranker]
  CG1 --> FS[(Feature store)]
  CG2 --> RR[Realtime ranker]

Partition keys: get this right

Kafka guarantees order within a partition, not across. If you key by user_id, every event for that user lands on the same partition — order is preserved per user. Forget the key and you'll be debugging "why did the model see a click before the impression" for a week.

Consumer groups in one line

Consumers in the same group split partitions among themselves. Consumers in different groups each read the whole topic.

That single property is what lets your feature store, your monitoring, and your retraining job all coexist on one topic without stepping on each other.

Things that will bite you

Key points

The minimum useful diagram

For ML use cases, picture three flows on top of every topic:

Hot path — realtime feature aggregation feeding the ranker.
Warm path — minute-level rollups into your feature store.
Cold path — periodic dump to your lake for training.

Same topic. Three consumer groups. Three SLAs. That's Kafka for ML in one sentence.

← previous

Overall Engineering Clarity — Data, Distributed Systems and AI (Deep Dive)

Designing a recommendation system from scratch