recommendation systemsintermediate 6m2026-05-26

Designing a recommendation system from scratch

Retrieval vs ranking, candidate generation, freshness vs relevance — the tradeoffs every real recommender lives by.

Every recommender system you've ever built (or ever will build) eventually becomes the same two-stage pipeline:

Retrieval — narrow billions of items down to ~1000 candidates, fast.
Ranking — score those 1000 candidates with a heavier model, return the top 20.

The art is choosing the right model for each stage and keeping them honest about what they're optimizing.

The two-stage pipeline

Retrieval stage

Retrieval has one job: make sure the right item is somewhere in the top-1000. Recall is everything. Common approaches:

Co-occurrence / matrix factorization — classic. Fast. Underrated.
Two-tower embeddings — user tower, item tower, dot product in latent space. ANN-served.
Heuristic rails — fresh, popular, geographic, "more from creator X". Always include these as candidate sources.

Now you have 1000 candidates. Ranking decides the order. Features explode here: query × item × user × context × history. Models are typically gradient-boosted trees or deep cross-network architectures.

Retrieval vs ranking — at a glance

Stage	Goal	Latency budget	Model size	Metric
Retrieval	High recall	< 30ms	Small / ANN	Recall@1000
Ranking	High precision	< 100ms	Large / DNN	nDCG, AUC, business metric

Tradeoffs you can't avoid

Freshness vs relevance. A model trained on yesterday is always better; users want today.
Diversity vs engagement. Models will collapse to "show the user 5 things they obviously like". You'll have to fight for diversity explicitly.
Cold start. New users have no signal. New items have no clicks. Bootstrap with content embeddings + popularity priors.

Key points

Where this journey goes next

Follow the Recommendation Systems Journey for: collaborative filtering, learned embeddings, ANN search at scale, retrieval architectures, and finally real-time serving topologies. Each post zooms one stage at a time.

← previous

Kafka 101 for ML engineers

How Transformers actually attend