Structured plan · 28 days · 5 tracks

The 28-Day Engineering Plan

One deep topic per day, rotating across Data Engineering, Machine Learning, AI & LLMs, OOP & Programming, and System Design. Each day is self-contained: a primary video, three readings, a hands-on exercise, a LeetCode problem, a reflection prompt, and ~1,000 words of distilled notes with diagrams.

28 days · self-paced · standalone, but stronger in sequence.

D01
Day 01 — Transformer Internals — Attention, Embeddings, Positional Encoding
Every modern LLM, agent and RAG stack rests on the transformer. Knowing how Q/K/V flow through multi-head attention with residual streams is the unlock for prom…
AI & LLMs
D02
Day 02 — Apache Spark Architecture — Driver, Executors, Shuffles, Catalyst
Spark is still the workhorse for petabyte ETL and feature engineering. Understanding the execution model is the difference between a 9-minute job and a 9-hour j…
Data Eng
D03
Day 03 — Gradient Boosted Trees — XGBoost / LightGBM, Loss, Regularisation
On tabular data (still the majority of business ML) GBDTs beat deep nets and are the default at every credit, fraud and ads shop. Knowing the loss math and the…
AI & LLMs
D04
Day 04 — Designing a URL Shortener at Scale — IDs, Storage, Cache, CDN
The classic warm-up for every system design loop. It exercises ID generation, key-value modelling, caching, hot-key handling, analytics and CDN edge — all trans…
Sys Design
D05
Day 05 — SOLID Principles + Strategy / Factory / Observer Patterns in Python
Clean OO design isn't legacy folklore — it's how you keep agent frameworks, data pipelines, and microservices maintainable. SOLID + a handful of patterns is the…
OOP / Prog
D06
Day 06 — Retrieval-Augmented Generation (RAG) End-to-End
RAG is the most-shipped LLM pattern in industry today — every internal knowledge bot, support agent and code-search tool is some flavour of it. Knowing chunking…
AI & LLMs
D07
Day 07 — Apache Kafka Deep Dive — Partitions, Replication, Consumer Groups, Exactly-Once
Kafka is the universal log of modern data infra. Mastery of partition keys, consumer-group rebalancing, and EOS is the difference between a streaming system tha…
Data Eng
D08
Day 08 — Embeddings, Vector Spaces & Contrastive Learning
Embeddings power search, RAG, recsys, clustering, deduplication and anomaly detection. Understanding *why* a contrastive objective produces useful vectors (vs s…
AI & LLMs
D09
Day 09 — CAP, PACELC, Consensus — Raft, Quorums, and Realistic Trade-offs
Every distributed system you'll design has to make a CAP-style call. Understanding Raft / Paxos and quorum reads/writes lets you reason precisely instead of wav…
Sys Design
D10
Day 10 — Concurrency Models — Threads, Asyncio, GIL, Actors
Every backend you build will block on IO or compute. Knowing *which* concurrency model to pick (and *why*) cuts latency by 10× and prevents the classes of bugs…
OOP / Prog
D11
Day 11 — Function Calling, Tool Use, and Agentic Loops
Tool calling turns LLMs from text generators into autonomous workers. Mastering the agent loop (plan → call → observe → continue) is the bedrock of every Copilo…
AI & LLMs
D12
Day 12 — Lakehouse Architecture — Delta Lake / Iceberg / Hudi, ACID on Object Storage
The lakehouse is now the default analytics substrate (Databricks, Snowflake Iceberg, Microsoft Fabric, AWS Glue Iceberg). ACID + time travel + schema evolution…
Data Eng
D13
Day 13 — MLOps — Experiment Tracking, Model Registry, CI/CD for Models
Models that don't ship don't matter. MLOps is the engineering wrapper that turns notebook experiments into versioned, monitored, retrainable production assets.
AI & LLMs
D14
Day 14 — Sharding, Replication & Multi-Region Databases
The moment one database can't hold your data, you shard. The moment one region can't serve your users, you go multi-region. Both decisions cascade into every ot…
Sys Design
D15
Day 15 — Memory Model & Garbage Collection — Heap, GC, Leaks, Profiling
High-throughput services live and die by GC. Knowing the heap layout, GC algorithms and how to read a flame graph is the difference between '99p = 80 ms' and '9…
OOP / Prog
D16
Day 16 — LLM Evaluation — Benchmarks, LLM-as-Judge, RAGAS, Inspect
If you can't measure it, you can't ship it. Modern LLM eval is its own discipline — task-specific benchmarks, golden sets, LLM judges with rubrics, and slice-le…
AI & LLMs
D17
Day 17 — Streaming with Flink / Spark Structured Streaming — Watermarks & Windows
Real-time analytics, fraud, IOT, personalisation — all flow through stream processors. Watermarks, late data, and exactly-once semantics are the hard parts that…
Data Eng
D18
Day 18 — Recommender Systems — Two-Tower, Multi-Stage Ranking
Recsys drives YouTube, TikTok, Amazon, Spotify, Pinterest — and is one of the highest-ROI ML problems anywhere. The two-tower retriever + multi-stage ranker is…
AI & LLMs
D19
Day 19 — Designing a Chat / Messaging System at Scale
Chat exercises every hard design lever: fan-out vs fan-in, presence, ordering, push vs pull, media uploads, end-to-end encryption. WhatsApp / Slack / Teams patt…
Sys Design
D20
Day 20 — Idiomatic Python (and C#) — Type Hints, Protocols, Dataclasses, Pattern Matching
Idiomatic code is the difference between a senior who writes maintainable systems and a junior who writes 'Python that runs'. Type hints + Protocols + dataclass…
OOP / Prog
D21
Day 21 — LLM Serving — vLLM, Continuous Batching, KV Cache, Speculative Decoding
Inference cost and latency are the dominant operational concerns for any LLM product. vLLM-style continuous batching gives 5-20× throughput; speculative decodin…
AI & LLMs
D22
Day 22 — Data Modelling — Dimensional, Data Vault, OBT for the Lakehouse Era
Storage is cheap, but a bad model rots a platform from inside. Knowing when to dimensional-model, when to use Data Vault, and when to flat-OBT determines whethe…
Data Eng
D23
Day 23 — Multimodal LLMs — Vision-Language, Audio, and Tool-Use Combined
2025 is the year multimodal went default. GPT-4o, Claude 3.5 Sonnet vision, Gemini 1.5/2 — every serious agent now sees and hears. Understanding how visual toke…
AI & LLMs
D24
Day 24 — Data Governance, Lineage, Quality — Catalogs, Contracts, Observability
At scale, governance isn't bureaucracy; it's how you keep trust in your data. Lineage, quality contracts, and observability tools are now first-class platform c…
Data Eng
D25
Day 25 — Practical Fine-Tuning — LoRA / QLoRA, PEFT, Instruction Datasets, DPO
Fine-tuning is back as the way to specialise models for your domain and reduce inference cost. LoRA + QLoRA make it tractable on commodity GPUs; DPO / ORPO have…
AI & LLMs
D26
Day 26 — Caching Strategies — CDN, Application Cache, Cache-Aside, Read-Through, Write-Through
Caching is the single biggest lever for latency and cost. Cache invalidation is one of two hard problems in CS. Knowing the standard patterns + their failure mo…
Sys Design
D27
Day 27 — API Design — REST, GraphQL, gRPC; Versioning, Pagination, Errors
APIs are contracts that outlive their authors. Bad API design ripples for years; good API design quietly enables product velocity. Knowing when to pick REST / G…
OOP / Prog
D28
Day 28 — Putting It Together — A Production AI Agent (Capstone Day)
Final synthesis day. You've covered transformers, RAG, tools, evals, fine-tuning, serving, multimodal. Today you combine them into one complete agent design — a…
AI & LLMs

All 28 days available · Looking for the rest of the blog? Back to the feed →

Day 01 — Transformer Internals — Attention, Embeddings, Positional Encoding

Day 02 — Apache Spark Architecture — Driver, Executors, Shuffles, Catalyst

Day 03 — Gradient Boosted Trees — XGBoost / LightGBM, Loss, Regularisation

Day 04 — Designing a URL Shortener at Scale — IDs, Storage, Cache, CDN

Day 05 — SOLID Principles + Strategy / Factory / Observer Patterns in Python

Day 06 — Retrieval-Augmented Generation (RAG) End-to-End

Day 07 — Apache Kafka Deep Dive — Partitions, Replication, Consumer Groups, Exactly-Once

Day 08 — Embeddings, Vector Spaces & Contrastive Learning

Day 09 — CAP, PACELC, Consensus — Raft, Quorums, and Realistic Trade-offs

Day 10 — Concurrency Models — Threads, Asyncio, GIL, Actors

Day 11 — Function Calling, Tool Use, and Agentic Loops

Day 12 — Lakehouse Architecture — Delta Lake / Iceberg / Hudi, ACID on Object Storage

Day 13 — MLOps — Experiment Tracking, Model Registry, CI/CD for Models

Day 14 — Sharding, Replication & Multi-Region Databases

Day 15 — Memory Model & Garbage Collection — Heap, GC, Leaks, Profiling

Day 16 — LLM Evaluation — Benchmarks, LLM-as-Judge, RAGAS, Inspect

Day 17 — Streaming with Flink / Spark Structured Streaming — Watermarks & Windows

Day 18 — Recommender Systems — Two-Tower, Multi-Stage Ranking

Day 19 — Designing a Chat / Messaging System at Scale

Day 20 — Idiomatic Python (and C#) — Type Hints, Protocols, Dataclasses, Pattern Matching

Day 21 — LLM Serving — vLLM, Continuous Batching, KV Cache, Speculative Decoding

Day 22 — Data Modelling — Dimensional, Data Vault, OBT for the Lakehouse Era

Day 23 — Multimodal LLMs — Vision-Language, Audio, and Tool-Use Combined

Day 24 — Data Governance, Lineage, Quality — Catalogs, Contracts, Observability

Day 25 — Practical Fine-Tuning — LoRA / QLoRA, PEFT, Instruction Datasets, DPO

Day 26 — Caching Strategies — CDN, Application Cache, Cache-Aside, Read-Through, Write-Through

Day 27 — API Design — REST, GraphQL, gRPC; Versioning, Pagination, Errors

Day 28 — Putting It Together — A Production AI Agent (Capstone Day)