Search Tech Journey

Find topics, journeys and posts

back to blog
system designintermediate 12m2026-06-09

URL Shortener Part 2 — Cache, CDN, Hot Keys, Abuse

Session 8 of the 48-session learning series.

Why this session matters

This is Session 08 of 48 in the System Design track. It builds on the rhythm of one focused topic, paced so you have time to actually absorb it rather than rush.

Agenda

  • Read path with cache + CDN + KV — the 95% hit-rate math
  • Hot-key problem — viral links blow up one Redis shard
  • Edge caching strategy — do the 302 at the edge
  • Abuse, phishing, safe-browsing integration
  • Analytics pipeline — never block the redirect

Pre-read (skim before the session)

Deep dive

1. The read path — where most of the QPS lives

From Part 1: 600k peak read QPS, R:W ≈ 100:1. If every read hit DynamoDB you'd burn money and latency. Solution: layered cache.

User  →  CDN edge  →  Load balancer  →  Resolver svc  →  Redis  →  DynamoDB
         (95% hit)                       (95% hit on miss)   (5% miss)   (last resort)

At steady state, a 95% CDN hit and a 95% Redis hit means the DB sees 600k × 0.05 × 0.05 = 1,500 QPS — totally fine for DynamoDB.

2. Cache hit-rate math (Zipf 0.9)

Short-link traffic is heavily Zipfian — a few links get most of the clicks. With shape parameter ≈ 0.9, the top 10% of codes account for ~85% of traffic. Caching the top 1M codes (out of 180B) in a 10 GB Redis cluster gives you the 95% number above.

Key choices:

  • TTL: 24h with stale-while-revalidate — serve stale on background refresh.
  • Negative cache: cache misses too (5 min TTL) so attacks scraping random codes don't hit DB.
  • Cache key: short:{code} namespace. Value: serialised (long_url, expires_at).

3. Sequence diagram for the redirect

User ── GET /aB3xY9 ──▶ CDN
                              │
                  cached? ──yes──▶ 302 → long URL
                              │
                              no
                              │
                              ▼
                            LB → Resolver
                                       │
                              Redis GET short:aB3xY9
                                       │
                          hit ──yes──▶ 302 + Cache-Control
                                       │
                                       no
                                       ▼
                              DynamoDB GetItem
                                       │
                                       ▼
                              Redis SETEX 24h
                                       ▼
                              302 + Cache-Control public, max-age=86400

Key: the Cache-Control header tells the CDN (and the browser) to cache the 302 itself, so the second hit doesn't even reach Redis.

4. The hot-key problem

A single viral link can hit 50k QPS on its Redis shard, saturating one CPU. Mitigations:

  • Edge cache doing the 302 — the CDN absorbs it.
  • Client-side cache — the 302 has Cache-Control: public, max-age=86400 — browser remembers.
  • Two-level cache — a process-local LRU (1k entries) on every resolver instance. Per-instance hit rate is small but on the hottest 10 keys it absorbs all the load.
  • Request coalescing — if 100 requests for the same key arrive at the resolver in the same 50 ms while it's in flight, fold them into one DB read.
  • Sharded counters — only relevant if you're updating a click count synchronously (you shouldn't — see analytics).

5. Edge caching strategy

If you're on Cloudflare Workers / Fastly Compute / AWS CloudFront with Lambda@Edge, you can run the 302 logic at the edge:

edge worker:
  code = path[1:]
  if code in edge_kv:
    return Response.redirect(edge_kv[code], 302)
  long = await origin.get(`/api/resolve/${code}`)
  edge_kv.set(code, long, ttl=86400)
  return Response.redirect(long, 302)

Cloudflare Workers KV is eventually consistent (good enough for redirects) and edge-replicated. P99 hot-path latency drops from ~50 ms to <5 ms.

6. Abuse and security

  • Phishing / malware — integrate with Google Safe Browsing API on write; re-scan periodically. Block on hit.
  • PII in URLs — hash for analytics; drop query-string tokens.
  • Rate limiting — per-IP and per-API-key on writes (e.g. 100 / min). Use a token bucket in Redis with INCR + EXPIRE.
  • Custom-alias squatting — maintain a blocklist (admin, api, login…) and disallow.
  • Open redirects that bypass auth on partner sites — emit Referrer-Policy: no-referrer on the 302.
  • HMAC-signed 302s to detect CDN cache poisoning.

7. Analytics — fire-and-forget

Never write analytics on the hot path. Pattern:

resolver:
  emit Kafka event {code, ts, ua, ip, referer}  # async, non-blocking
  return 302

Downstream:

  • Flink / Spark Structured Streaming consumes the Kafka topic.
  • Aggregates per-minute counts into ClickHouse or Druid for dashboards.
  • Daily roll-ups into the warehouse for retention analysis.

If you write to a time-series DB on the hot path: +50 ms latency, -50% capacity. Don't.

8. Multi-region

Reads are dominantly edge-cached, so a single write region + global read replicas works to ~1B URLs:

  • Write region: us-east-1 → DynamoDB Global Table replicates everywhere (eventual, ~1 s).
  • Read regions: every continent. Resolver hits local DynamoDB replica on cache miss.
  • Latency budget: edge serves at <10 ms; on full miss, regional read is ~30 ms.

Beyond 1B writes, partition codes by prefix across regions; route at the edge based on prefix. Conflict-free because codes are unique by construction.

9. Failure modes (the interviewer will ask)

FailureWhat happensMitigation
Counter shard downWrites stallBatched IDs in app cache buy 5–10 m
Redis cluster splitDB absorbs traffic brieflyCircuit-breaker + serve stale
CDN poisoningWrong long URL servedHMAC-signed 302; short TTL
DynamoDB hot partitionThrottling on one keyCache absorbs; spread analytics writes
Mass abuse scraperEats cache + DB capacityNegative-cache misses; WAF rate-limit

10. Numbers to know for the interview

  • 600k peak read QPS → 95% CDN + 95% Redis → 1.5k QPS at DB.
  • Cache cost: top 1M of 180B keys gets 95% hit. ~10 GB Redis.
  • CDN cost: depends on egress, but typically 10–20% of total infra at this scale.
  • DynamoDB cost: ~$1.25 per million read units at on-demand; far cheaper than self-host SQL at this scale.

11. What's next (Session 12 — CAP / PACELC)

Session 12 zooms out from one system to the general theory: what does it actually mean for a distributed system to be 'consistent' or 'available'? You'll need it for every subsequent SYS session.

Reading material

Books:

  • System Design Interview Vol. 1 — Alex Xu (ch. 8 continued: caching layer; ch. 7: rate limiter)
  • Designing Data-Intensive Applications — Kleppmann (ch. 5: Replication, ch. 8: Trouble with Distributed Systems)
  • The Linux Programming Interface — Michael Kerrisk (relevant for kernel-level rate-limit primitives if you're curious)

Papers:

Official docs:

Blog posts:

In-depth research material

Videos

LeetCode — Lru Cache

Post-session checklist

By the end of this session you should be able to:

  • Draw the full read path including CDN, Redis, DB — with hit/miss arrows.
  • Defend the 95/95 cache hit math with a Zipf assumption.
  • Solve the hot-key problem with at least 3 layered techniques.
  • Sketch the analytics pipeline; explain why it can't be on the hot path.
  • Solve lru-cache (Medium) — the workhorse of every cache layer.

Generated from sessions_data.py + content_part*.py. To edit a video / leetcode / title, edit the data file and re-run write_sessions.py.