ai mladvanced 12m2026-06-16

Day 18 — Recommender Systems — Two-Tower, Multi-Stage Ranking

Recsys drives YouTube, TikTok, Amazon, Spotify, Pinterest — and is one of the highest-ROI ML problems anywhere. The two-tower retriever + multi-stage ranker is…

Modern recsys is multi-stage because the catalogue is huge (millions of items) but the model must respond in 100 ms. Each stage trades recall for precision.

1. The three stages

Retrieval (recall): from millions → 1k. Cheap, optimised for not missing relevant items.
Ranker (precision): from 1k → 100. Heavy features, expensive model.
Re-ranker (UX): diversity, freshness, fairness, business rules.

2. Retrieval methods

Collaborative filtering: matrix factorisation (ALS) — historical, still strong baseline.
Two-tower (DSSM): user tower + item tower → embeddings → ANN. Industry default.
Graph (GNN): PinSage, GraphSAGE — captures multi-hop similarity.
Sequence models (SASRec, BERT4Rec): treat user history as a sequence, predict next item.
LLM-augmented: use LLM to summarise user intent → embed → retrieve.

3. Two-tower in detail

User tower: f(user features, recent history) → user_emb.
Item tower: g(item features) → item_emb.
Train with in-batch negatives + sampled negatives + hard negatives mined from previous model.
Loss: InfoNCE / sampled softmax / pairwise BPR.
Index item embeddings in HNSW / ScaNN; query at runtime with user_emb.

4. Ranker

Heavy feature engineering:

User × item cross features (user's avg ctr on this item's category).
Counters (impression count last 7d, click rate last 1h).
Embeddings (user, item, category, author).
Context (time of day, device, locale). Model: GBDT (LightGBM) for tabular dominance, or DLRM-style deep + wide for embedding-heavy. Loss: logistic for click, multi-task for click+watch_time+like.

5. Multi-task & multi-objective

Real feeds optimise multiple objectives: click, watch time, like, save, follow. Architectures:

MMoE (multi-gate mixture of experts) — shared experts, task-specific gates.
PLE (progressive layered extraction). Final score = weighted sum (weights tuned via online A/B or constrained optimisation).

6. Negative sampling

Implicit feedback (clicks only, no ratings): every non-click is a potential negative, but most are uninformative.

Random: cheap, biased toward popular items.
Popularity-weighted: counters head bias.
Hard: items the current model ranks high but user didn't engage with.

7. Cold start

New user: rely on context features (location, signup source, day-1 actions).
New item: content features (text, image, taxonomy); contextual bandit for exploration.

🚀 In Practice

Trade-offs, exercises, what to ship today.

8. Online learning & exploration

Pure exploit = filter bubble + no learning on new content. Inject exploration:

ε-greedy: 5% random in slot N.
Contextual bandits (LinUCB, Thompson Sampling) for the exploration slot.
Off-policy evaluation (IPS, doubly robust) to measure new policies on logged data.

9. Diversity, freshness, fairness

Re-ranker uses DPP / MMR for diversity, decay for freshness, post-processing for fair exposure. Business rules: "no two items from same author in top 5".

10. Evaluation

Offline: NDCG, recall@k on logged interactions (with care — biased).
Online: A/B on CTR, watch time, retention. The truth.
Counterfactual: replay logged actions under new policy with importance sampling.

"How would you build YouTube?" Strong answers: retrieve with two-tower, rank with multi-task GBDT or DLRM, re-rank for diversity, A/B with retention as north star. Bonus: cold-start strategy + exploration.

Key points

Resources

Practice Problem: Find K Closest Elements (Medium)

← previous

Day 17 — Streaming with Flink / Spark Structured Streaming — Watermarks & Windows

Day 19 — Designing a Chat / Messaging System at Scale