system designintermediate 12m2026-06-09

Caching Strategies — CDN, Application Cache, Cache-Aside, Read-Through

Session 26 of the 48-session learning series.

Date: Mon, 2026-06-29 · Time: 18:00–20:00 IST · Track: 🏗️ System Design (SYS) · Parent 28-day topic: Day 26 · Est. read: 2 h

Why this session matters

This is Session 26 of 48 in the System Design track. Caching is the cheapest performance lever you have — and the easiest place to introduce correctness bugs. Knowing the patterns by name is what separates a senior from a confident hand-waver in design rounds.

Agenda

The cache hierarchy — CPU, app, distributed, CDN, browser
The four patterns: cache-aside, read-through, write-through, write-back
Eviction & TTL — LRU, LFU, ARC, sliding windows
Invalidation — the second hardest problem in CS
Hot keys, thundering herd, stampede control

Pre-read (skim before the session)

Deep dive

1. The cache hierarchy (memorise this)

[ CPU L1/L2/L3 ]    ~1–30 ns      KB–MB
[ RAM / page cache ] ~100 ns      GBs
[ Local app cache ]  ~µs          GBs (in-process LRU)
[ Distributed cache ] ~1 ms LAN   100s GB (Redis, Memcached)
[ Database ]         ~5–50 ms     TBs
[ CDN edge ]         ~10–80 ms    cached objects
[ Origin ]           ~100–500 ms  source of truth

Every hop you skip is an order-of-magnitude faster. Design the read path to fall through cleanly: edge → distributed → DB. Each layer should have a sane hit-rate target (CDN 90%+, app cache 60–80%, distributed 50–70%).

2. Cache-aside (lazy load)

App is in charge:

def get(key):
    v = cache.get(key)
    if v is None:
        v = db.get(key)
        if v is not None:
            cache.set(key, v, ttl=300)
    return v

Pros: simple, works with any cache; misses are explicit. Cons: stampede on cold key; first request always slow.

The 90% pattern. Use this unless you have a reason not to.

3. Read-through

Cache library handles miss + load + populate. App just calls cache.get(key). Cache invokes the loader function on miss.

Pros: no boilerplate in app; cache controls concurrency on miss. Cons: tighter coupling between cache layer and DB; harder to debug.

4. Write-through

Every write goes to cache + DB synchronously.

client → app → [ cache.set + db.write ] → ack

Pros: cache is always fresh; reads after writes see correct data. Cons: write latency = max(cache, DB); both must be up.

5. Write-back (write-behind)

Writes hit cache; cache flushes to DB asynchronously.

Pros: lowest write latency; batches writes to DB. Cons: durability gap — power off = data loss. Complex retry / ordering.

Used in: high-write workloads where DB can't keep up (timeline writes, analytics counters). Pair with a persistent buffer (Kafka, durable Redis AOF).

6. Refresh-ahead

Cache proactively refreshes hot keys before TTL expiry. Avoids the user-facing miss.

if key.ttl_remaining < threshold and key.access_frequency > X:
    schedule_refresh(key)

Useful for: small set of predictable hot keys (homepage, top product). Bad for cold tail.

7. Eviction policies

Policy	Behaviour	When to use
LRU	Evict least-recently-used	Default; good for temporal locality
LFU	Evict least-frequently-used	Catalogues with stable hot items
ARC	Adaptive between LRU/LFU	Mixed workloads (DB block cache)
FIFO	Evict oldest	Almost never
TinyLFU + W-TinyLFU	Frequency + recency hybrid	Modern caches (Caffeine)
Random	Random eviction	Surprisingly competitive

Default to LRU. Move to W-TinyLFU (e.g. Caffeine in JVM) when scan resistance matters.

8. TTL strategies

Fixed TTL — simple; risk of synchronised expiry.
Jittered TTL — ttl = base + uniform(-jitter, +jitter). Always do this.
Sliding TTL — extend on access. Keeps hot items warm.
Event-driven — TTL until source of truth changes (more pub/sub than TTL).

9. Invalidation

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton

Patterns:

TTL only — simplest; tolerates staleness up to TTL.
Write-through — invalidate on every write. Coupled but correct.
Versioned key — user:42:v17. Bump version on write; old keys age out.
Pub/sub invalidation — DB publishes change events; caches subscribe and evict.
Tagged invalidation — group keys by tag; invalidate by tag (e.g. "all posts by user X").

Two-cache problem: app cache + CDN cache. Invalidate both. Easy to forget the CDN.

10. Thundering herd / cache stampede

Hot key TTL expires → 10k requests all miss → all hit DB → DB falls over.

Mitigations:

Single-flight — only one in-flight load per key; others wait on a future.
Probabilistic early expiration — refresh near (not at) TTL with rising probability.
Soft TTL — serve stale while reloading in background.
Locking — SETNX cache.lock before reload; loser falls back to stale.

Production combo: single-flight + soft TTL + jittered hard TTL. Stampede deaths drop to near zero.

11. Hot keys

When one key gets 10% of all traffic (celebrity user, viral post):

Replicate — store key on N replicas; client picks one at random.
Local cache promotion — copy hot keys to in-process L1.
Sharding — append shard suffix per requester; merge at read or pre-aggregate.
Coalescing — counters can be incremented locally and flushed periodically.

Detection: log p99 keys; alert when any key crosses 1% of total ops.

12. CDN specifics

CDN edges cache by URL. Useful primitives:

Cache-Control: public, max-age=86400, s-maxage=3600
stale-while-revalidate=600 — serve old up to 10 minutes while refreshing.
stale-if-error=86400 — serve old for a day if origin down.
Surrogate keys / cache tags for invalidation.

CDN configuration is part of system design now. A 90% CDN hit rate is the difference between $500/mo and$ 50/mo of egress.

13. Negative caching

Cache the absence of a key. Prevents repeated DB lookups for a missing record.

v = cache.get(key)
if v == NEGATIVE_SENTINEL:
    return None
if v is None:
    v = db.get(key)
    cache.set(key, v if v else NEGATIVE_SENTINEL, ttl=60 if v else 10)

Critical defence against scan attacks and ?user_id=99999999 probing.

14. Reality check

Cache hit rate is the headline metric. Track:

Hit rate per cache tier (CDN, distributed, local).
Latency percentiles for hits vs misses.
Eviction rate (high = cache too small or churn too high).
Stampede events (single-flight queue depth).

Build a /cache/stats endpoint. You'll thank yourself at 3 AM.

Link: https://leetcode.com/problems/lru-cache-ii/
Difficulty: Medium
Why this problem: Implementing LRU with O(1) get/put using hash + doubly-linked list is the canonical cache exercise.
Time-box: 30 minutes. Look up the editorial only after.

Post-session checklist

By the end of this session you should be able to:

Name the 4 cache patterns and pick the right one for write-heavy vs read-heavy workloads.
Explain LRU vs LFU vs ARC vs W-TinyLFU and when each wins.
Write a single-flight + soft-TTL reload to prevent stampede.
Configure Cache-Control + stale-while-revalidate for a CDN.
Spot a hot key and apply 2 mitigations (replication, sharding).
Solve lru-cache-ii — hash + doubly-linked list with O(1) ops.

Generated from sessions_data.py + content_part*.py. To edit a video / leetcode / title, edit the data file and re-run write_sessions.py.

← previous

LLM Evaluation — Benchmarks, LLM-as-Judge, RAGAS, Inspect

Recommender Systems — Two-Tower, Multi-Stage Ranking