Search Tech Journey

Find topics, journeys and posts

back to blog
ai mladvanced 12m2026-06-16

Day 18 — Recommender Systems — Two-Tower, Multi-Stage Ranking

Recsys drives YouTube, TikTok, Amazon, Spotify, Pinterest — and is one of the highest-ROI ML problems anywhere. The two-tower retriever + multi-stage ranker is…

Modern recsys is multi-stage because the catalogue is huge (millions of items) but the model must respond in 100 ms. Each stage trades recall for precision.

🧠 Concept

Why it matters & the mental model.

1. The three stages

  • Retrieval (recall): from millions → 1k. Cheap, optimised for not missing relevant items.
  • Ranker (precision): from 1k → 100. Heavy features, expensive model.
  • Re-ranker (UX): diversity, freshness, fairness, business rules.

2. Retrieval methods

  • Collaborative filtering: matrix factorisation (ALS) — historical, still strong baseline.
  • Two-tower (DSSM): user tower + item tower → embeddings → ANN. Industry default.
  • Graph (GNN): PinSage, GraphSAGE — captures multi-hop similarity.
  • Sequence models (SASRec, BERT4Rec): treat user history as a sequence, predict next item.
  • LLM-augmented: use LLM to summarise user intent → embed → retrieve.

3. Two-tower in detail

  • User tower: f(user features, recent history) → user_emb.
  • Item tower: g(item features) → item_emb.
  • Train with in-batch negatives + sampled negatives + hard negatives mined from previous model.
  • Loss: InfoNCE / sampled softmax / pairwise BPR.
  • Index item embeddings in HNSW / ScaNN; query at runtime with user_emb.

🛠 Deep Dive

Internals, code, architecture.

4. Ranker

Heavy feature engineering:

  • User × item cross features (user's avg ctr on this item's category).
  • Counters (impression count last 7d, click rate last 1h).
  • Embeddings (user, item, category, author).
  • Context (time of day, device, locale). Model: GBDT (LightGBM) for tabular dominance, or DLRM-style deep + wide for embedding-heavy. Loss: logistic for click, multi-task for click+watch_time+like.

5. Multi-task & multi-objective

Real feeds optimise multiple objectives: click, watch time, like, save, follow. Architectures:

  • MMoE (multi-gate mixture of experts) — shared experts, task-specific gates.
  • PLE (progressive layered extraction). Final score = weighted sum (weights tuned via online A/B or constrained optimisation).

6. Negative sampling

Implicit feedback (clicks only, no ratings): every non-click is a potential negative, but most are uninformative.

  • Random: cheap, biased toward popular items.
  • Popularity-weighted: counters head bias.
  • Hard: items the current model ranks high but user didn't engage with.

7. Cold start

  • New user: rely on context features (location, signup source, day-1 actions).
  • New item: content features (text, image, taxonomy); contextual bandit for exploration.

🚀 In Practice

Trade-offs, exercises, what to ship today.

8. Online learning & exploration

Pure exploit = filter bubble + no learning on new content. Inject exploration:

  • ε-greedy: 5% random in slot N.
  • Contextual bandits (LinUCB, Thompson Sampling) for the exploration slot.
  • Off-policy evaluation (IPS, doubly robust) to measure new policies on logged data.

9. Diversity, freshness, fairness

Re-ranker uses DPP / MMR for diversity, decay for freshness, post-processing for fair exposure. Business rules: "no two items from same author in top 5".

10. Evaluation

  • Offline: NDCG, recall@k on logged interactions (with care — biased).
  • Online: A/B on CTR, watch time, retention. The truth.
  • Counterfactual: replay logged actions under new policy with importance sampling.

11. What to take away

"How would you build YouTube?" Strong answers: retrieve with two-tower, rank with multi-task GBDT or DLRM, re-rank for diversity, A/B with retention as north star. Bonus: cold-start strategy + exploration.

Key points

    Resources

    Practice Problem: Find K Closest Elements (Medium)