Learning how to build an recommendation system from initial signals
From a few initial adopters of a product, how we can target new set of users who are more likely can use the product
Building a Targeting System from Early Adopter Signals
Goal: Given a small set of early adopters, build a scoring model to identify users most likely to adopt a product.
Git Repo: github.com/dinesh-coderepo/targetting-system
๐ Key Concepts at a Glance
| System | How It Works | Example |
|---|---|---|
| Recommendation | Learn from a user's own patterns โ extend to similar items | "You watched X, try Y" |
| Targeting | Learn from early adopters' profiles โ find similar non-adopters | "Users like your best customers" |
| Cold Start | Very few signals โ traditional collaborative filtering fails | This blog's core challenge |
๐๏ธ System Architecture
๐ง Background & Prerequisites
1. Types of Recommendation Systems
| Approach | How It Works | Pros | Cons |
|---|---|---|---|
| User-based CF | Find similar users โ recommend their preferences | Intuitive | Doesn't scale; sparse |
| Item-based CF | Find similar items โ recommend to liking users | Stable | Needs interaction data |
| Matrix Factorization | Decompose user-item matrix into latent factors | Handles sparsity | Cold start problem |
| Content-Based | Match item features to user preferences | No cold start for items | Limited to feature quality |
| Hybrid | Combine CF + content-based | Best of both worlds | Complex to implement |
๐ก Netflix, Spotify, and YouTube all use hybrid approaches combining multiple methods.
2. The Cold Start Problem
This is the core challenge for this blog โ very few adopters means extreme data sparsity.
Solutions for targeting with few adopters:
- ๐น Feature similarity โ Match non-adopters against adopter feature profiles
- ๐น Lookalike modeling โ Find users who "look like" early adopters (demographics + behavior)
- ๐น Propensity scoring โ Binary classifier: adopter (1) vs non-adopter (0)
3. Propensity / Targeting Model
The heart of this project โ scoring every user by their likelihood to adopt.
Feature Categories:
| Category | Example Features |
|---|---|
| ๐ง Demographic | Age, location, job title, industry |
| ๐ Behavioral | Login frequency, feature usage, time spent, page views |
| ๐ค Social | Connections to existing adopters, team adoption rate |
| โฑ๏ธ Temporal | Recency, frequency, monetary (RFM analysis) |
Model Choices:
| Model | When to Use |
|---|---|
| Logistic Regression | Baseline โ interpretable, fast. Understand odds ratios. |
| Random Forest / XGBoost | Better accuracy, non-linear relationships, feature importance |
| Neural Networks | Large-scale datasets with many features |
โ ๏ธ Class Imbalance: If only 1% are adopters, naive models just predict "no" 99% of the time. Use SMOTE (oversampling), class weights, focal loss, or undersampling.
4. Evaluation Metrics
| Metric | What It Measures | Why It Matters |
|---|---|---|
| AUC-ROC | Discrimination ability across thresholds | Best single metric for targeting |
| Precision@K | Of top K predictions, how many are actual adopters | Directly measures targeting quality |
| Recall@K | Of all adopters, how many are in top K | Did we find most adopters? |
| Lift Chart | How much better than random selection | "Top 10% scored 5x more likely than random" |
| NDCG | Ranking quality with position weighting | Are true adopters ranked highest? |
โ ๏ธ Never use accuracy with imbalanced data โ it's misleading.
โ ๏ธ Never random split โ use time-based splits (train on past, test on future) to prevent data leakage.
5. Tools & Libraries
| Library | Purpose |
|---|---|
scikit-learn | LogisticRegression, RandomForest, metrics, pipelines |
xgboost / lightgbm | Gradient boosting for targeting models |
surprise | Collaborative filtering (SVD, KNN, NMF) |
lightfm | Hybrid recommendations (collaborative + content) |
implicit | Implicit feedback models (ALS, BPR) |
pandas + numpy | Data manipulation & feature engineering |
matplotlib + seaborn | Visualization (lift charts, ROC curves) |
โ TODO โ Remaining Work
| # | Task | Priority |
|---|---|---|
| 1 | Implement basic collaborative filtering (user-item matrix, cosine similarity) | ๐ด High |
| 2 | Implement matrix factorization (SVD) with Surprise | ๐ด High |
| 3 | Build propensity model with logistic regression | ๐ด High |
| 4 | Feature engineering pipeline (behavioral + demographic) | ๐ด High |
| 5 | Handle class imbalance (SMOTE, class weights) | ๐ก Medium |
| 6 | Evaluate with AUC-ROC, lift charts, decile analysis | ๐ก Medium |
| 7 | Build cold-start fallback strategy | ๐ก Medium |
| 8 | Compare model approaches in a results table | ๐ก Medium |
| 9 | Add Mermaid architecture diagram of full targeting pipeline | ๐ข Low |
| 10 | Connect to Monolith paper learnings | ๐ข Low |
๐ง Reference Implementation โ Propensity Model with Lookalike Scoring
A minimal but complete pipeline: given a tiny set of adopters, score every non-adopter for likelihood to adopt.
# targeting.py
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import roc_auc_score, average_precision_score
def build_dataset(users: pd.DataFrame, adopters: set[str]) -> tuple[pd.DataFrame, pd.Series]:
"""users has columns: user_id, age, logins_30d, features_used, tenure_days, team_size, industry.
adopters is a set of user_ids that already converted."""
df = users.copy()
df["label"] = df["user_id"].isin(adopters).astype(int)
y = df.pop("label")
X = pd.get_dummies(df.drop(columns=["user_id"]), columns=["industry"], drop_first=True)
return X, y
def train(X: pd.DataFrame, y: pd.Series):
X_tr, X_val, y_tr, y_val = train_test_split(
X, y, test_size=0.25, stratify=y, random_state=42
)
scaler = StandardScaler().fit(X_tr)
X_tr_s = scaler.transform(X_tr); X_val_s = scaler.transform(X_val)
# Baseline โ logistic regression with class_weight for imbalance
lr = LogisticRegression(class_weight="balanced", max_iter=500).fit(X_tr_s, y_tr)
# Stronger โ gradient boosting handles non-linear interactions
gb = GradientBoostingClassifier(n_estimators=200, max_depth=3).fit(X_tr, y_tr)
for name, model, X_eval in [("logreg", lr, X_val_s), ("gbm", gb, X_val)]:
p = model.predict_proba(X_eval)[:, 1]
print(f"{name}: AUC={roc_auc_score(y_val, p):.3f} "
f"PR-AUC={average_precision_score(y_val, p):.3f}")
return gb, scaler
def score_and_rank(model, users: pd.DataFrame, adopters: set[str], top_k: int = 1000):
"""Score all non-adopters and return the top-K targets with lift."""
non_adopters = users[~users["user_id"].isin(adopters)].copy()
X = pd.get_dummies(non_adopters.drop(columns=["user_id"]),
columns=["industry"], drop_first=True)
non_adopters["score"] = model.predict_proba(X)[:, 1]
ranked = non_adopters.sort_values("score", ascending=False)
base_rate = len(adopters) / len(users)
top = ranked.head(top_k)
# Lift = model's positive rate in top-K / random base rate
# (true labels not known for non-adopters โ use held-out to measure lift in practice)
print(f"Base adoption rate: {base_rate:.3%} | Targeting top {top_k} users")
return ranked[["user_id", "score"]]
Evaluating with a Proper Time-Based Split
Random splits leak future information. In targeting, the adopters at time T became adopters because of behaviour before T. Evaluate like this:
# Split by signup date, not randomly
cutoff = "2025-06-01"
train_users = users[users["signup_date"] < cutoff]
test_users = users[users["signup_date"] >= cutoff]
# Adopters in each cohort
train_adopters = adopter_events.query("event_date < @cutoff")["user_id"].unique()
test_adopters = adopter_events.query("event_date >= @cutoff")["user_id"].unique()
Lift Chart โ The Right Way to Present Results
def lift_chart(y_true, y_score, deciles=10):
df = pd.DataFrame({"y": y_true, "p": y_score}).sort_values("p", ascending=False)
df["decile"] = pd.qcut(df["p"].rank(method="first"), deciles, labels=False)
base = df["y"].mean()
table = df.groupby("decile")["y"].mean().rename("rate").to_frame()
table["lift"] = table["rate"] / base
return table.sort_index(ascending=False)
A healthy targeting model shows the top decile at 3โ10ร lift over baseline. If the top decile is only 1.5ร, your features aren't predictive โ go back to feature engineering before tuning the model.
When every TODO above is ticked and your lift chart shows โฅ 3ร in the top decile on a time-based test set, flip this post to status: published.