URL Shortener Part 2 — Cache, CDN, Hot Keys, Abuse
Session 8 of the 48-session learning series.
Why this session matters
This is Session 08 of 48 in the System Design track. It builds on the rhythm of one focused topic, paced so you have time to actually absorb it rather than rush.
Agenda
- Read path with cache + CDN + KV — the 95% hit-rate math
- Hot-key problem — viral links blow up one Redis shard
- Edge caching strategy — do the 302 at the edge
- Abuse, phishing, safe-browsing integration
- Analytics pipeline — never block the redirect
Pre-read (skim before the session)
- ByteByteGo — Top 5 Caching Strategies
- Cloudflare — What is a CDN?
- Bitly engineering — dealing with hot shorts
- DDIA Ch. 5 — Replication
Deep dive
1. The read path — where most of the QPS lives
From Part 1: 600k peak read QPS, R:W ≈ 100:1. If every read hit DynamoDB you'd burn money and latency. Solution: layered cache.
User → CDN edge → Load balancer → Resolver svc → Redis → DynamoDB
(95% hit) (95% hit on miss) (5% miss) (last resort)
At steady state, a 95% CDN hit and a 95% Redis hit means the DB sees 600k × 0.05 × 0.05 = 1,500 QPS — totally fine for DynamoDB.
2. Cache hit-rate math (Zipf 0.9)
Short-link traffic is heavily Zipfian — a few links get most of the clicks. With shape parameter ≈ 0.9, the top 10% of codes account for ~85% of traffic. Caching the top 1M codes (out of 180B) in a 10 GB Redis cluster gives you the 95% number above.
Key choices:
- TTL: 24h with stale-while-revalidate — serve stale on background refresh.
- Negative cache: cache misses too (5 min TTL) so attacks scraping random codes don't hit DB.
- Cache key:
short:{code}namespace. Value: serialised(long_url, expires_at).
3. Sequence diagram for the redirect
User ── GET /aB3xY9 ──▶ CDN
│
cached? ──yes──▶ 302 → long URL
│
no
│
▼
LB → Resolver
│
Redis GET short:aB3xY9
│
hit ──yes──▶ 302 + Cache-Control
│
no
▼
DynamoDB GetItem
│
▼
Redis SETEX 24h
▼
302 + Cache-Control public, max-age=86400
Key: the Cache-Control header tells the CDN (and the browser) to cache the 302 itself, so the second hit doesn't even reach Redis.
4. The hot-key problem
A single viral link can hit 50k QPS on its Redis shard, saturating one CPU. Mitigations:
- Edge cache doing the 302 — the CDN absorbs it.
- Client-side cache — the 302 has
Cache-Control: public, max-age=86400— browser remembers. - Two-level cache — a process-local LRU (1k entries) on every resolver instance. Per-instance hit rate is small but on the hottest 10 keys it absorbs all the load.
- Request coalescing — if 100 requests for the same key arrive at the resolver in the same 50 ms while it's in flight, fold them into one DB read.
- Sharded counters — only relevant if you're updating a click count synchronously (you shouldn't — see analytics).
5. Edge caching strategy
If you're on Cloudflare Workers / Fastly Compute / AWS CloudFront with Lambda@Edge, you can run the 302 logic at the edge:
edge worker:
code = path[1:]
if code in edge_kv:
return Response.redirect(edge_kv[code], 302)
long = await origin.get(`/api/resolve/${code}`)
edge_kv.set(code, long, ttl=86400)
return Response.redirect(long, 302)
Cloudflare Workers KV is eventually consistent (good enough for redirects) and edge-replicated. P99 hot-path latency drops from ~50 ms to <5 ms.
6. Abuse and security
- Phishing / malware — integrate with Google Safe Browsing API on write; re-scan periodically. Block on hit.
- PII in URLs — hash for analytics; drop query-string tokens.
- Rate limiting — per-IP and per-API-key on writes (e.g. 100 / min). Use a token bucket in Redis with
INCR + EXPIRE. - Custom-alias squatting — maintain a blocklist (admin, api, login…) and disallow.
- Open redirects that bypass auth on partner sites — emit
Referrer-Policy: no-referreron the 302. - HMAC-signed 302s to detect CDN cache poisoning.
7. Analytics — fire-and-forget
Never write analytics on the hot path. Pattern:
resolver:
emit Kafka event {code, ts, ua, ip, referer} # async, non-blocking
return 302
Downstream:
- Flink / Spark Structured Streaming consumes the Kafka topic.
- Aggregates per-minute counts into ClickHouse or Druid for dashboards.
- Daily roll-ups into the warehouse for retention analysis.
If you write to a time-series DB on the hot path: +50 ms latency, -50% capacity. Don't.
8. Multi-region
Reads are dominantly edge-cached, so a single write region + global read replicas works to ~1B URLs:
- Write region: us-east-1 → DynamoDB Global Table replicates everywhere (eventual, ~1 s).
- Read regions: every continent. Resolver hits local DynamoDB replica on cache miss.
- Latency budget: edge serves at <10 ms; on full miss, regional read is ~30 ms.
Beyond 1B writes, partition codes by prefix across regions; route at the edge based on prefix. Conflict-free because codes are unique by construction.
9. Failure modes (the interviewer will ask)
| Failure | What happens | Mitigation |
|---|---|---|
| Counter shard down | Writes stall | Batched IDs in app cache buy 5–10 m |
| Redis cluster split | DB absorbs traffic briefly | Circuit-breaker + serve stale |
| CDN poisoning | Wrong long URL served | HMAC-signed 302; short TTL |
| DynamoDB hot partition | Throttling on one key | Cache absorbs; spread analytics writes |
| Mass abuse scraper | Eats cache + DB capacity | Negative-cache misses; WAF rate-limit |
10. Numbers to know for the interview
- 600k peak read QPS → 95% CDN + 95% Redis → 1.5k QPS at DB.
- Cache cost: top 1M of 180B keys gets 95% hit. ~10 GB Redis.
- CDN cost: depends on egress, but typically 10–20% of total infra at this scale.
- DynamoDB cost: ~$1.25 per million read units at on-demand; far cheaper than self-host SQL at this scale.
11. What's next (Session 12 — CAP / PACELC)
Session 12 zooms out from one system to the general theory: what does it actually mean for a distributed system to be 'consistent' or 'available'? You'll need it for every subsequent SYS session.
Reading material
Books:
- System Design Interview Vol. 1 — Alex Xu (ch. 8 continued: caching layer; ch. 7: rate limiter)
- Designing Data-Intensive Applications — Kleppmann (ch. 5: Replication, ch. 8: Trouble with Distributed Systems)
- The Linux Programming Interface — Michael Kerrisk (relevant for kernel-level rate-limit primitives if you're curious)
Papers:
- Web Caching with Consistent Hashing (Karger et al., 1997) — the paper that gave us consistent hashing.
- Memcached: A Distributed Memory Object Caching System (Facebook, 2013) — how Facebook ran one of the largest caches ever.
Official docs:
Blog posts:
- Caching at Netflix: The Hidden Microservice — Netflix Tech Blog
- How Discord Stores Trillions of Messages — Discord — cache layering at scale.
- Cloudflare — Why & how we use Workers KV
- Hot-key mitigation (Stripe Engineering) — relevant patterns.
In-depth research material
- system-design-primer — Caching section — exhaustive comparison of strategies.
- Cloudflare blog — DDoS mitigation case studies — abuse-handling patterns directly applicable to short-link redirects.
- Netflix EVCache — github.com/Netflix/EVCache — Memcached-on-EC2 production code.
- Pinterest Memcached Engineering blog
- Bitly — Why we use NSQ + Redis — the messaging+cache stack behind a real shortener.
- The Twelve-Factor App: Backing Services — the philosophy behind treating cache/CDN as attached resources.
Videos
- Cache Systems Every Developer Should Know — ByteByteGo · 6 min — animated overview of cache-aside, read-through, write-through, write-back.
- Caching Pitfalls Every Developer Should Know — ByteByteGo · 7 min — the failure modes (thundering herd, cache stampede, hot keys, TTL drift).
- Caching in System Design Interviews — Hello Interview — Ex-Meta Staff Eng · 30 min — interview-grade walk through the strategy palette.
- What Is A CDN? How Does It Work? — ByteByteGo · 4 min — CDN basics in 4 minutes — the layer in front of the shortener.
- How does Caching on the Backend work? — Software Developer Diaries · 23 min — middle-tier cache layout: which calls hit Redis vs application memory.
LeetCode — Lru Cache
- Link: https://leetcode.com/problems/lru-cache/
- Difficulty: Medium
- Why this problem: Doubly-linked list + hash-map; both ops O(1).
- Time-box: 30 minutes. Look up the editorial only after.
Post-session checklist
By the end of this session you should be able to:
- Draw the full read path including CDN, Redis, DB — with hit/miss arrows.
- Defend the 95/95 cache hit math with a Zipf assumption.
- Solve the hot-key problem with at least 3 layered techniques.
- Sketch the analytics pipeline; explain why it can't be on the hot path.
- Solve
lru-cache(Medium) — the workhorse of every cache layer.
Generated from sessions_data.py + content_part*.py. To edit a video / leetcode / title, edit the data file and re-run write_sessions.py.