ai ml intermediate 6m2026-06-09 The 48-Session Learning Series — A Planning Guide A 48-session deep-prep planning guide across LLMs, ML, system design, data engineering and OOP. Self-paced — pick your own cadence.
A structured planning guide covering 48 sessions across the LLM / ML / system-design / data-engineering / OOP stack. It's the rebuilt-and-paced version of an earlier 28-day deep-prep plan, broken into 48 thinner sessions so each topic gets the breathing room it deserves.
This is not a calendar. It's a menu . Pick a cadence that suits you — daily, weekly, weekends-only, sprints — and work through the sessions in any order. The tracks are interleaved by design, so you can skip around without losing context.
The original 28-day plan looked great on paper. In practice, several days were trying to cram a full afternoon of material into one window. Two-hour windows for "transformers from scratch" or "designing Twitter" don't actually deliver the depth they promise.
So the heavy topics got split, a few were extended into adjacent specialist sessions, and the result is 48 thinner sessions across the same surface area. Same goal — be genuinely fluent across the stack. Better pacing.
Sprint mode — 1 session/day across ~7 weeks of weekdays, weekends off.
Weekend mode — 2 sessions/Saturday + 2/Sunday → ~12 weeks.
Topic mode — pick a track (e.g. LLM) and walk it end-to-end before moving on.
Interview cram — pick the 20–25 sessions that match the role you're targeting.
Whatever cadence you pick, the rule of thumb is: one session is one focused two-hour block , ideally with a notebook and a code editor open.
🧠 LLM — LLMs & agents (13 sessions)
📈 ML — Machine learning (7 sessions)
🏗️ SYS — System design (10 sessions)
🗂️ DE — Data engineering (11 sessions)
🧱 OOP — OOP & languages (7 sessions)
Agenda — 5 bullets, what the session covers
Pre-read — 3–5 papers / blog posts / official docs to skim before
Deep dive — explanations, math where useful, ASCII diagrams, code, real production numbers
Reading material — books / papers / docs to come back to later
In-depth research material — curated external links
Video reference — one hand-picked YouTube video
LeetCode problem — URL + difficulty + 2-line hint
Post-session checklist — what you should be able to do or explain by the end
# Track Title 01 🧠 LLM Transformers Part 1 — Attention, Q/K/V, Multi-Head 02 🗂️ DE Spark Part 1 — Driver, Executors, RDDs, Lazy Evaluation 03 🏗️ SYS URL Shortener Part 1 — Numbers, IDs, Storage 04 🧱 OOP SOLID Part 1 — SRP, OCP, LSP with Python Examples 05 📈 ML Gradient Boosted Trees Part 1 — Boosting Intuition, Trees, Loss 06 🧠 LLM Transformers Part 2 — Positional Encoding, RoPE, MLP, LayerNorm 07 🗂️ DE Spark Part 2 — Shuffles, Catalyst, AQE, Tuning 08 🏗️ SYS URL Shortener Part 2 — Cache, CDN, Hot Keys, Abuse 09 🧠 LLM RAG Part 1 — Why, Chunking, Embeddings, Vector Stores 10 🗂️ DE Kafka Part 1 — Brokers, Topics, Partitions, Producers 11 🧱 OOP SOLID Part 2 — ISP, DIP, and Design Patterns (Strategy, Factory, Observer) 12 🏗️ SYS CAP, PACELC, Quorums — How Distributed Systems Actually Trade Off 13 🧠 LLM RAG Part 2 — Retrieval, Re-Ranking, Generation, Evaluation 14 📈 ML GBDT Part 2 — XGBoost, LightGBM, Regularisation, In-Practice Tuning 15 🗂️ DE Kafka Part 2 — Replication, ISR, Consumer Groups, Exactly-Once 16 🧱 OOP Concurrency Models — Threads, Asyncio, GIL, Actors 17 🧠 LLM Embeddings, Vector Spaces, Contrastive Learning 18 🏗️ SYS Sharding & Replication — Partition Keys, Hot Spots, Multi-Region 19 🗂️ DE Lakehouse — Delta Lake, Iceberg, Hudi, ACID on Object Storage 20 🧱 OOP Memory Model, GC, Heap, GC Leaks, Profiling 21 🧠 LLM Function Calling, Tool Use, Agentic Loops 22 🏗️ SYS Designing a Chat System — Connections, Fanout, Storage, Delivery 23 🗂️ DE Streaming with Flink/Spark — Watermarks, Windows, State 24 📈 ML MLOps — Experiment Tracking, Model Registry, CI/CD for Models 25 🧠 LLM LLM Evaluation — Benchmarks, LLM-as-Judge, RAGAS, Inspect 26 🏗️ SYS Caching Strategies — CDN, Application Cache, Cache-Aside, Read-Through 27 📈 ML Recommender Systems — Two-Tower, Multi-Stage Ranking 28 🗂️ DE Data Modelling — Dimensional, Data Vault, OBT for the Lakehouse 29 🧱 OOP Idiomatic Python (and a Touch of C++) — Type Hints, Protocols, Dataclasses 30 🧠 LLM LLM Serving Part 1 — vLLM, KV Cache, Continuous Batching 31 🏗️ SYS API Design — REST, GraphQL, gRPC, Versioning, Pagination, Errors 32 🗂️ DE Data Governance — Lineage, Quality, Catalogs, Contracts, Observability 33 📈 ML Practical Fine-Tuning — LoRA, QLoRA, PEFT, Instruction Datasets 34 🧠 LLM LLM Serving Part 2 — Speculative Decoding, Quantisation, Throughput 35 🏗️ SYS News Feed / Timeline System — Fanout-on-Read vs Write, Ranking 36 🧠 LLM Multimodal LLMs — Vision, Language, Audio, Tool Use Combined 37 🗂️ DE Petabyte Cost Optimisation — Compression, Partitioning, Z-Order, File Sizing 38 📈 ML Feature Engineering & Feature Stores at Scale 39 🏗️ SYS Designing a Distributed Job Queue — Reliability, Backoff, Idempotency 40 🗂️ DE Change Data Capture — Debezium, Outbox Pattern, Snapshot+Stream 41 🧱 OOP Testing, Mocks, Property-Based Tests, Mutation Testing 42 🧠 LLM Prompt Engineering at Production Scale — Templates, Caching, Drift 43 📈 ML Online Learning, Bandits, Counterfactual Evaluation 44 🏗️ SYS Designing a Search Engine — Crawl, Index, Query, Ranking 45 🧠 LLM LLM Safety — Jailbreaks, Prompt Injection, Output Filtering, Red-Teaming 46 🗂️ DE Observability for Data Pipelines — SLAs, SLOs, Freshness, Data Tests 47 🧱 OOP Production Error Handling — Retries, Circuit Breakers, Timeouts, Bulkheads 48 🧠 LLM Capstone — Building a Production AI Agent End-to-End
Repo: dinesh-coderepo/preparation/48-sessions
Every session is open. Skip the ones you already know; double down on the ones that bite. There's no deadline — just steady, paced work.