Search Tech Journey

Find topics, journeys and posts

back to blog
recommendation systemsintermediate 6m2026-05-26

Designing a recommendation system from scratch

Retrieval vs ranking, candidate generation, freshness vs relevance — the tradeoffs every real recommender lives by.

Every recommender system you've ever built (or ever will build) eventually becomes the same two-stage pipeline:

  1. Retrieval — narrow billions of items down to ~1000 candidates, fast.
  2. Ranking — score those 1000 candidates with a heavier model, return the top 20.

The art is choosing the right model for each stage and keeping them honest about what they're optimizing.

The two-stage pipeline

Retrieval stage

Retrieval has one job: make sure the right item is somewhere in the top-1000. Recall is everything. Common approaches:

  • Co-occurrence / matrix factorization — classic. Fast. Underrated.
  • Two-tower embeddings — user tower, item tower, dot product in latent space. ANN-served.
  • Heuristic rails — fresh, popular, geographic, "more from creator X". Always include these as candidate sources.

Ranking stage

Now you have 1000 candidates. Ranking decides the order. Features explode here: query × item × user × context × history. Models are typically gradient-boosted trees or deep cross-network architectures.

Retrieval vs ranking — at a glance

StageGoalLatency budgetModel sizeMetric
RetrievalHigh recall< 30msSmall / ANNRecall@1000
RankingHigh precision< 100msLarge / DNNnDCG, AUC, business metric

Tradeoffs you can't avoid

  • Freshness vs relevance. A model trained on yesterday is always better; users want today.
  • Diversity vs engagement. Models will collapse to "show the user 5 things they obviously like". You'll have to fight for diversity explicitly.
  • Cold start. New users have no signal. New items have no clicks. Bootstrap with content embeddings + popularity priors.
Key points

    Where this journey goes next

    Follow the Recommendation Systems Journey for: collaborative filtering, learned embeddings, ANN search at scale, retrieval architectures, and finally real-time serving topologies. Each post zooms one stage at a time.

    Mid-article nudge

    Liked this so far? Subscribe and the next deep dive lands in your inbox Monday.

    Related concepts

    AttentionRoPEKV cacheMixture of ExpertsTwo-tower ranking