Caching Strategies — CDN, Application Cache, Cache-Aside, Read-Through
Session 26 of the 48-session learning series.
Date: Mon, 2026-06-29 · Time: 18:00–20:00 IST · Track: 🏗️ System Design (SYS) · Parent 28-day topic: Day 26 · Est. read: 2 h
Why this session matters
This is Session 26 of 48 in the System Design track. Caching is the cheapest performance lever you have — and the easiest place to introduce correctness bugs. Knowing the patterns by name is what separates a senior from a confident hand-waver in design rounds.
Agenda
- The cache hierarchy — CPU, app, distributed, CDN, browser
- The four patterns: cache-aside, read-through, write-through, write-back
- Eviction & TTL — LRU, LFU, ARC, sliding windows
- Invalidation — the second hardest problem in CS
- Hot keys, thundering herd, stampede control
Pre-read (skim before the session)
- DDIA Chapter 1 — Reliable, Scalable, Maintainable
- Facebook — Scaling Memcache at Facebook
- AWS — Caching strategies
- Cloudflare — How a CDN works
Deep dive
1. The cache hierarchy (memorise this)
[ CPU L1/L2/L3 ] ~1–30 ns KB–MB
[ RAM / page cache ] ~100 ns GBs
[ Local app cache ] ~µs GBs (in-process LRU)
[ Distributed cache ] ~1 ms LAN 100s GB (Redis, Memcached)
[ Database ] ~5–50 ms TBs
[ CDN edge ] ~10–80 ms cached objects
[ Origin ] ~100–500 ms source of truth
Every hop you skip is an order-of-magnitude faster. Design the read path to fall through cleanly: edge → distributed → DB. Each layer should have a sane hit-rate target (CDN 90%+, app cache 60–80%, distributed 50–70%).
2. Cache-aside (lazy load)
App is in charge:
def get(key):
v = cache.get(key)
if v is None:
v = db.get(key)
if v is not None:
cache.set(key, v, ttl=300)
return v
Pros: simple, works with any cache; misses are explicit. Cons: stampede on cold key; first request always slow.
The 90% pattern. Use this unless you have a reason not to.
3. Read-through
Cache library handles miss + load + populate. App just calls cache.get(key). Cache invokes the loader function on miss.
Pros: no boilerplate in app; cache controls concurrency on miss. Cons: tighter coupling between cache layer and DB; harder to debug.
4. Write-through
Every write goes to cache + DB synchronously.
client → app → [ cache.set + db.write ] → ack
Pros: cache is always fresh; reads after writes see correct data. Cons: write latency = max(cache, DB); both must be up.
5. Write-back (write-behind)
Writes hit cache; cache flushes to DB asynchronously.
Pros: lowest write latency; batches writes to DB. Cons: durability gap — power off = data loss. Complex retry / ordering.
Used in: high-write workloads where DB can't keep up (timeline writes, analytics counters). Pair with a persistent buffer (Kafka, durable Redis AOF).
6. Refresh-ahead
Cache proactively refreshes hot keys before TTL expiry. Avoids the user-facing miss.
if key.ttl_remaining < threshold and key.access_frequency > X:
schedule_refresh(key)
Useful for: small set of predictable hot keys (homepage, top product). Bad for cold tail.
7. Eviction policies
| Policy | Behaviour | When to use |
|---|---|---|
| LRU | Evict least-recently-used | Default; good for temporal locality |
| LFU | Evict least-frequently-used | Catalogues with stable hot items |
| ARC | Adaptive between LRU/LFU | Mixed workloads (DB block cache) |
| FIFO | Evict oldest | Almost never |
| TinyLFU + W-TinyLFU | Frequency + recency hybrid | Modern caches (Caffeine) |
| Random | Random eviction | Surprisingly competitive |
Default to LRU. Move to W-TinyLFU (e.g. Caffeine in JVM) when scan resistance matters.
8. TTL strategies
- Fixed TTL — simple; risk of synchronised expiry.
- Jittered TTL —
ttl = base + uniform(-jitter, +jitter). Always do this. - Sliding TTL — extend on access. Keeps hot items warm.
- Event-driven — TTL until source of truth changes (more pub/sub than TTL).
9. Invalidation
"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton
Patterns:
- TTL only — simplest; tolerates staleness up to TTL.
- Write-through — invalidate on every write. Coupled but correct.
- Versioned key —
user:42:v17. Bump version on write; old keys age out. - Pub/sub invalidation — DB publishes change events; caches subscribe and evict.
- Tagged invalidation — group keys by tag; invalidate by tag (e.g. "all posts by user X").
Two-cache problem: app cache + CDN cache. Invalidate both. Easy to forget the CDN.
10. Thundering herd / cache stampede
Hot key TTL expires → 10k requests all miss → all hit DB → DB falls over.
Mitigations:
- Single-flight — only one in-flight load per key; others wait on a future.
- Probabilistic early expiration — refresh near (not at) TTL with rising probability.
- Soft TTL — serve stale while reloading in background.
- Locking —
SETNX cache.lockbefore reload; loser falls back to stale.
Production combo: single-flight + soft TTL + jittered hard TTL. Stampede deaths drop to near zero.
11. Hot keys
When one key gets 10% of all traffic (celebrity user, viral post):
- Replicate — store key on N replicas; client picks one at random.
- Local cache promotion — copy hot keys to in-process L1.
- Sharding — append shard suffix per requester; merge at read or pre-aggregate.
- Coalescing — counters can be incremented locally and flushed periodically.
Detection: log p99 keys; alert when any key crosses 1% of total ops.
12. CDN specifics
CDN edges cache by URL. Useful primitives:
Cache-Control: public, max-age=86400, s-maxage=3600stale-while-revalidate=600— serve old up to 10 minutes while refreshing.stale-if-error=86400— serve old for a day if origin down.- Surrogate keys / cache tags for invalidation.
CDN configuration is part of system design now. A 90% CDN hit rate is the difference between 50/mo of egress.
13. Negative caching
Cache the absence of a key. Prevents repeated DB lookups for a missing record.
v = cache.get(key)
if v == NEGATIVE_SENTINEL:
return None
if v is None:
v = db.get(key)
cache.set(key, v if v else NEGATIVE_SENTINEL, ttl=60 if v else 10)
Critical defence against scan attacks and ?user_id=99999999 probing.
14. Reality check
Cache hit rate is the headline metric. Track:
- Hit rate per cache tier (CDN, distributed, local).
- Latency percentiles for hits vs misses.
- Eviction rate (high = cache too small or churn too high).
- Stampede events (single-flight queue depth).
Build a /cache/stats endpoint. You'll thank yourself at 3 AM.
Reading material
- Designing Data-Intensive Applications — chapter on caching primitives
- Scaling Memcache at Facebook (NSDI'13)
- Caffeine — TinyLFU paper
- Redis best practices
In-depth research material
- AWS — Database caching strategies
- Cloudflare blog — cache key best practices
- Netflix — EVCache
- Discord — How Discord stores trillions of messages
Video reference
▶︎ Caching Patterns Deep-Dive (System Design Interview)
Pick a quiet 30 minutes during this session to actually watch it. Don't multitask.
LeetCode — LRU Cache II
- Link: https://leetcode.com/problems/lru-cache-ii/
- Difficulty: Medium
- Why this problem: Implementing LRU with O(1) get/put using hash + doubly-linked list is the canonical cache exercise.
- Time-box: 30 minutes. Look up the editorial only after.
Post-session checklist
By the end of this session you should be able to:
- Name the 4 cache patterns and pick the right one for write-heavy vs read-heavy workloads.
- Explain LRU vs LFU vs ARC vs W-TinyLFU and when each wins.
- Write a single-flight + soft-TTL reload to prevent stampede.
- Configure
Cache-Control + stale-while-revalidatefor a CDN. - Spot a hot key and apply 2 mitigations (replication, sharding).
- Solve
lru-cache-ii— hash + doubly-linked list with O(1) ops.
Generated from sessions_data.py + content_part*.py. To edit a video / leetcode / title, edit the data file and re-run write_sessions.py.