Search Tech Journey

Find topics, journeys and posts

back to blog
ai mlintermediate 6m2026-06-09

The 48-Session Learning Series — A Planning Guide

A 48-session deep-prep planning guide across LLMs, ML, system design, data engineering and OOP. Self-paced — pick your own cadence.

The 48-Session Learning Series — A Planning Guide

A structured planning guide covering 48 sessions across the LLM / ML / system-design / data-engineering / OOP stack. It's the rebuilt-and-paced version of an earlier 28-day deep-prep plan, broken into 48 thinner sessions so each topic gets the breathing room it deserves.

This is not a calendar. It's a menu. Pick a cadence that suits you — daily, weekly, weekends-only, sprints — and work through the sessions in any order. The tracks are interleaved by design, so you can skip around without losing context.

Why 48 sessions

The original 28-day plan looked great on paper. In practice, several days were trying to cram a full afternoon of material into one window. Two-hour windows for "transformers from scratch" or "designing Twitter" don't actually deliver the depth they promise.

So the heavy topics got split, a few were extended into adjacent specialist sessions, and the result is 48 thinner sessions across the same surface area. Same goal — be genuinely fluent across the stack. Better pacing.

Suggested ways to use it

  • Sprint mode — 1 session/day across ~7 weeks of weekdays, weekends off.
  • Weekend mode — 2 sessions/Saturday + 2/Sunday → ~12 weeks.
  • Topic mode — pick a track (e.g. LLM) and walk it end-to-end before moving on.
  • Interview cram — pick the 20–25 sessions that match the role you're targeting.

Whatever cadence you pick, the rule of thumb is: one session is one focused two-hour block, ideally with a notebook and a code editor open.

Tracks

  • 🧠 LLM — LLMs & agents (13 sessions)
  • 📈 ML — Machine learning (7 sessions)
  • 🏗️ SYS — System design (10 sessions)
  • 🗂️ DE — Data engineering (11 sessions)
  • 🧱 OOP — OOP & languages (7 sessions)

What each session contains

  1. Agenda — 5 bullets, what the session covers
  2. Pre-read — 3–5 papers / blog posts / official docs to skim before
  3. Deep dive — explanations, math where useful, ASCII diagrams, code, real production numbers
  4. Reading material — books / papers / docs to come back to later
  5. In-depth research material — curated external links
  6. Video reference — one hand-picked YouTube video
  7. LeetCode problem — URL + difficulty + 2-line hint
  8. Post-session checklist — what you should be able to do or explain by the end

The 48 sessions

#TrackTitle
01🧠 LLMTransformers Part 1 — Attention, Q/K/V, Multi-Head
02🗂️ DESpark Part 1 — Driver, Executors, RDDs, Lazy Evaluation
03🏗️ SYSURL Shortener Part 1 — Numbers, IDs, Storage
04🧱 OOPSOLID Part 1 — SRP, OCP, LSP with Python Examples
05📈 MLGradient Boosted Trees Part 1 — Boosting Intuition, Trees, Loss
06🧠 LLMTransformers Part 2 — Positional Encoding, RoPE, MLP, LayerNorm
07🗂️ DESpark Part 2 — Shuffles, Catalyst, AQE, Tuning
08🏗️ SYSURL Shortener Part 2 — Cache, CDN, Hot Keys, Abuse
09🧠 LLMRAG Part 1 — Why, Chunking, Embeddings, Vector Stores
10🗂️ DEKafka Part 1 — Brokers, Topics, Partitions, Producers
11🧱 OOPSOLID Part 2 — ISP, DIP, and Design Patterns (Strategy, Factory, Observer)
12🏗️ SYSCAP, PACELC, Quorums — How Distributed Systems Actually Trade Off
13🧠 LLMRAG Part 2 — Retrieval, Re-Ranking, Generation, Evaluation
14📈 MLGBDT Part 2 — XGBoost, LightGBM, Regularisation, In-Practice Tuning
15🗂️ DEKafka Part 2 — Replication, ISR, Consumer Groups, Exactly-Once
16🧱 OOPConcurrency Models — Threads, Asyncio, GIL, Actors
17🧠 LLMEmbeddings, Vector Spaces, Contrastive Learning
18🏗️ SYSSharding & Replication — Partition Keys, Hot Spots, Multi-Region
19🗂️ DELakehouse — Delta Lake, Iceberg, Hudi, ACID on Object Storage
20🧱 OOPMemory Model, GC, Heap, GC Leaks, Profiling
21🧠 LLMFunction Calling, Tool Use, Agentic Loops
22🏗️ SYSDesigning a Chat System — Connections, Fanout, Storage, Delivery
23🗂️ DEStreaming with Flink/Spark — Watermarks, Windows, State
24📈 MLMLOps — Experiment Tracking, Model Registry, CI/CD for Models
25🧠 LLMLLM Evaluation — Benchmarks, LLM-as-Judge, RAGAS, Inspect
26🏗️ SYSCaching Strategies — CDN, Application Cache, Cache-Aside, Read-Through
27📈 MLRecommender Systems — Two-Tower, Multi-Stage Ranking
28🗂️ DEData Modelling — Dimensional, Data Vault, OBT for the Lakehouse
29🧱 OOPIdiomatic Python (and a Touch of C++) — Type Hints, Protocols, Dataclasses
30🧠 LLMLLM Serving Part 1 — vLLM, KV Cache, Continuous Batching
31🏗️ SYSAPI Design — REST, GraphQL, gRPC, Versioning, Pagination, Errors
32🗂️ DEData Governance — Lineage, Quality, Catalogs, Contracts, Observability
33📈 MLPractical Fine-Tuning — LoRA, QLoRA, PEFT, Instruction Datasets
34🧠 LLMLLM Serving Part 2 — Speculative Decoding, Quantisation, Throughput
35🏗️ SYSNews Feed / Timeline System — Fanout-on-Read vs Write, Ranking
36🧠 LLMMultimodal LLMs — Vision, Language, Audio, Tool Use Combined
37🗂️ DEPetabyte Cost Optimisation — Compression, Partitioning, Z-Order, File Sizing
38📈 MLFeature Engineering & Feature Stores at Scale
39🏗️ SYSDesigning a Distributed Job Queue — Reliability, Backoff, Idempotency
40🗂️ DEChange Data Capture — Debezium, Outbox Pattern, Snapshot+Stream
41🧱 OOPTesting, Mocks, Property-Based Tests, Mutation Testing
42🧠 LLMPrompt Engineering at Production Scale — Templates, Caching, Drift
43📈 MLOnline Learning, Bandits, Counterfactual Evaluation
44🏗️ SYSDesigning a Search Engine — Crawl, Index, Query, Ranking
45🧠 LLMLLM Safety — Jailbreaks, Prompt Injection, Output Filtering, Red-Teaming
46🗂️ DEObservability for Data Pipelines — SLAs, SLOs, Freshness, Data Tests
47🧱 OOPProduction Error Handling — Retries, Circuit Breakers, Timeouts, Bulkheads
48🧠 LLMCapstone — Building a Production AI Agent End-to-End

Source

Repo: dinesh-coderepo/preparation/48-sessions

Every session is open. Skip the ones you already know; double down on the ones that bite. There's no deadline — just steady, paced work.