system designintermediate 12m2026-06-09

URL Shortener Part 2 — Cache, CDN, Hot Keys, Abuse

Session 8 of the 48-session learning series.

Why this session matters

This is Session 08 of 48 in the System Design track. It builds on the rhythm of one focused topic, paced so you have time to actually absorb it rather than rush.

Agenda

Read path with cache + CDN + KV — the 95% hit-rate math
Hot-key problem — viral links blow up one Redis shard
Edge caching strategy — do the 302 at the edge
Abuse, phishing, safe-browsing integration
Analytics pipeline — never block the redirect

Pre-read (skim before the session)

Deep dive

1. The read path — where most of the QPS lives

From Part 1: 600k peak read QPS, R:W ≈ 100:1. If every read hit DynamoDB you'd burn money and latency. Solution: layered cache.

User  →  CDN edge  →  Load balancer  →  Resolver svc  →  Redis  →  DynamoDB
         (95% hit)                       (95% hit on miss)   (5% miss)   (last resort)

At steady state, a 95% CDN hit and a 95% Redis hit means the DB sees 600k × 0.05 × 0.05 = 1,500 QPS — totally fine for DynamoDB.

2. Cache hit-rate math (Zipf 0.9)

Short-link traffic is heavily Zipfian — a few links get most of the clicks. With shape parameter ≈ 0.9, the top 10% of codes account for ~85% of traffic. Caching the top 1M codes (out of 180B) in a 10 GB Redis cluster gives you the 95% number above.

Key choices:

TTL: 24h with stale-while-revalidate — serve stale on background refresh.
Negative cache: cache misses too (5 min TTL) so attacks scraping random codes don't hit DB.
Cache key: short:{code} namespace. Value: serialised (long_url, expires_at).

3. Sequence diagram for the redirect

User ── GET /aB3xY9 ──▶ CDN
                              │
                  cached? ──yes──▶ 302 → long URL
                              │
                              no
                              │
                              ▼
                            LB → Resolver
                                       │
                              Redis GET short:aB3xY9
                                       │
                          hit ──yes──▶ 302 + Cache-Control
                                       │
                                       no
                                       ▼
                              DynamoDB GetItem
                                       │
                                       ▼
                              Redis SETEX 24h
                                       ▼
                              302 + Cache-Control public, max-age=86400

Key: the Cache-Control header tells the CDN (and the browser) to cache the 302 itself, so the second hit doesn't even reach Redis.

4. The hot-key problem

A single viral link can hit 50k QPS on its Redis shard, saturating one CPU. Mitigations:

Edge cache doing the 302 — the CDN absorbs it.
Client-side cache — the 302 has Cache-Control: public, max-age=86400 — browser remembers.
Two-level cache — a process-local LRU (1k entries) on every resolver instance. Per-instance hit rate is small but on the hottest 10 keys it absorbs all the load.
Request coalescing — if 100 requests for the same key arrive at the resolver in the same 50 ms while it's in flight, fold them into one DB read.
Sharded counters — only relevant if you're updating a click count synchronously (you shouldn't — see analytics).

5. Edge caching strategy

If you're on Cloudflare Workers / Fastly Compute / AWS CloudFront with Lambda@Edge, you can run the 302 logic at the edge:

edge worker:
  code = path[1:]
  if code in edge_kv:
    return Response.redirect(edge_kv[code], 302)
  long = await origin.get(`/api/resolve/${code}`)
  edge_kv.set(code, long, ttl=86400)
  return Response.redirect(long, 302)

Cloudflare Workers KV is eventually consistent (good enough for redirects) and edge-replicated. P99 hot-path latency drops from ~50 ms to <5 ms.

6. Abuse and security

Phishing / malware — integrate with Google Safe Browsing API on write; re-scan periodically. Block on hit.
PII in URLs — hash for analytics; drop query-string tokens.
Rate limiting — per-IP and per-API-key on writes (e.g. 100 / min). Use a token bucket in Redis with INCR + EXPIRE.
Custom-alias squatting — maintain a blocklist (admin, api, login…) and disallow.
Open redirects that bypass auth on partner sites — emit Referrer-Policy: no-referrer on the 302.
HMAC-signed 302s to detect CDN cache poisoning.

7. Analytics — fire-and-forget

Never write analytics on the hot path. Pattern:

resolver:
  emit Kafka event {code, ts, ua, ip, referer}  # async, non-blocking
  return 302

Downstream:

Flink / Spark Structured Streaming consumes the Kafka topic.
Aggregates per-minute counts into ClickHouse or Druid for dashboards.
Daily roll-ups into the warehouse for retention analysis.

If you write to a time-series DB on the hot path: +50 ms latency, -50% capacity. Don't.

8. Multi-region

Reads are dominantly edge-cached, so a single write region + global read replicas works to ~1B URLs:

Write region: us-east-1 → DynamoDB Global Table replicates everywhere (eventual, ~1 s).
Read regions: every continent. Resolver hits local DynamoDB replica on cache miss.
Latency budget: edge serves at <10 ms; on full miss, regional read is ~30 ms.

Beyond 1B writes, partition codes by prefix across regions; route at the edge based on prefix. Conflict-free because codes are unique by construction.

9. Failure modes (the interviewer will ask)

Failure	What happens	Mitigation
Counter shard down	Writes stall	Batched IDs in app cache buy 5–10 m
Redis cluster split	DB absorbs traffic briefly	Circuit-breaker + serve stale
CDN poisoning	Wrong long URL served	HMAC-signed 302; short TTL
DynamoDB hot partition	Throttling on one key	Cache absorbs; spread analytics writes
Mass abuse scraper	Eats cache + DB capacity	Negative-cache misses; WAF rate-limit

10. Numbers to know for the interview

600k peak read QPS → 95% CDN + 95% Redis → 1.5k QPS at DB.
Cache cost: top 1M of 180B keys gets 95% hit. ~10 GB Redis.
CDN cost: depends on egress, but typically 10–20% of total infra at this scale.
DynamoDB cost: ~$1.25 per million read units at on-demand; far cheaper than self-host SQL at this scale.

11. What's next (Session 12 — CAP / PACELC)

Session 12 zooms out from one system to the general theory: what does it actually mean for a distributed system to be 'consistent' or 'available'? You'll need it for every subsequent SYS session.

Reading material

Books:

System Design Interview Vol. 1 — Alex Xu (ch. 8 continued: caching layer; ch. 7: rate limiter)
Designing Data-Intensive Applications — Kleppmann (ch. 5: Replication, ch. 8: Trouble with Distributed Systems)
The Linux Programming Interface — Michael Kerrisk (relevant for kernel-level rate-limit primitives if you're curious)

Papers:

Web Caching with Consistent Hashing (Karger et al., 1997) — the paper that gave us consistent hashing.
Memcached: A Distributed Memory Object Caching System (Facebook, 2013) — how Facebook ran one of the largest caches ever.

Official docs:

Blog posts:

Caching at Netflix: The Hidden Microservice — Netflix Tech Blog
How Discord Stores Trillions of Messages — Discord — cache layering at scale.
Cloudflare — Why & how we use Workers KV
Hot-key mitigation (Stripe Engineering) — relevant patterns.

In-depth research material

system-design-primer — Caching section — exhaustive comparison of strategies.
Cloudflare blog — DDoS mitigation case studies — abuse-handling patterns directly applicable to short-link redirects.
Netflix EVCache — github.com/Netflix/EVCache — Memcached-on-EC2 production code.
Pinterest Memcached Engineering blog
Bitly — Why we use NSQ + Redis — the messaging+cache stack behind a real shortener.
The Twelve-Factor App: Backing Services — the philosophy behind treating cache/CDN as attached resources.

Videos

Cache Systems Every Developer Should Know — ByteByteGo · 6 min — animated overview of cache-aside, read-through, write-through, write-back.
Caching Pitfalls Every Developer Should Know — ByteByteGo · 7 min — the failure modes (thundering herd, cache stampede, hot keys, TTL drift).
Caching in System Design Interviews — Hello Interview — Ex-Meta Staff Eng · 30 min — interview-grade walk through the strategy palette.
What Is A CDN? How Does It Work? — ByteByteGo · 4 min — CDN basics in 4 minutes — the layer in front of the shortener.
How does Caching on the Backend work? — Software Developer Diaries · 23 min — middle-tier cache layout: which calls hit Redis vs application memory.

LeetCode — Lru Cache

Link: https://leetcode.com/problems/lru-cache/
Difficulty: Medium
Why this problem: Doubly-linked list + hash-map; both ops O(1).
Time-box: 30 minutes. Look up the editorial only after.

Post-session checklist

By the end of this session you should be able to:

Draw the full read path including CDN, Redis, DB — with hit/miss arrows.
Defend the 95/95 cache hit math with a Zipf assumption.
Solve the hot-key problem with at least 3 layered techniques.
Sketch the analytics pipeline; explain why it can't be on the hot path.
Solve lru-cache (Medium) — the workhorse of every cache layer.

Generated from sessions_data.py + content_part*.py. To edit a video / leetcode / title, edit the data file and re-run write_sessions.py.

← previous

Spark Part 2 — Shuffles, Catalyst, AQE, Tuning

RAG Part 1 — Why, Chunking, Embeddings, Vector Stores