Search Tech Journey

Find topics, journeys and posts

back to blog
system designintermediate 12m2026-06-09

URL Shortener Part 1 — Numbers, IDs, Storage

Session 3 of the 48-session learning series.

Why this session matters

This is Session 03 of 48 in the System Design track. It builds on the rhythm of one focused topic, paced so you have time to actually absorb it rather than rush.

Agenda

  • Back-of-envelope at 100M writes/day, 10B reads/day
  • Three ID schemes — hash, counter+base62, Snowflake — trade-offs
  • Choosing the storage engine — KV store vs SQL vs NoSQL
  • Schema design and idempotency on the write path
  • Why the cache + CDN discussion is its own session (Part 2)

Pre-read (skim before the session)

Deep dive

1. The shape of the problem

TinyURL looks trivial until you put numbers on it. Design for 100M new URLs/day, 10B reads/day, 5-year retention and let the layers fall out.

2. Back-of-envelope (always do this first)

  • Writes: 100M / 86,400 s ≈ 1.2k QPS average, 6k QPS at 5× peak.
  • Reads: 10B / 86,400 s ≈ 115k QPS average, 600k QPS at peak. R:W ≈ 100:1.
  • Storage: 100M × 365 × 5 ≈ 180 B records. Each record ≈ 500 B (short, long, owner, created_at, expires_at, click_count) → 90 TB. Plus 2–3× replication.

State these numbers out loud before you talk about anything else. They drive every subsequent decision.

3. ID / short-code scheme — three real options

Option A — Hash(long_url)[:7] in base62

  • ✅ Deterministic, idempotent (same long → same short).
  • ❌ Collisions need a check-and-retry against an index. Extra round trip.
  • ❌ Doesn't help if two users want different shorts for the same URL.

Option B — Auto-increment counter + base62

  • ✅ No collisions ever. Shortest codes (6–7 chars for 100B records: log₆₂(1e11) ≈ 6.1).
  • ❌ Single counter = single point of contention.
  • ✅ Mitigate with batch allocation: each app server reserves 10k IDs at once via INCRBY counter 10000 on Redis. Local cache, lock-free use, refill when low.

Option C — Snowflake (64-bit)

  • 41 bits timestamp | 10 bits machine_id | 12 bits sequence → coordination-free, sortable by time.
  • Base62 of 64 bits = 11 chars (too long). Truncate to 48 bits → 8 chars.
  • ✅ No coordination required across machines.
  • ❌ Codes leak creation time and machine_id (mild info leak).

Pick: For 100M/day I'd choose counter with batched allocation (Option B). Simplest, shortest codes, contention solved by batching. State the alternative and why you rejected it.

4. Storage — the access pattern is point-lookup by short code

This is textbook KV: O(1) lookup by single key, no scans.

ChoiceFit
DynamoDB / CassandraHash-partitioned by short_code. Horizontal scale. 90 TB is fine.
BigtableSame model, GCP-flavoured.
Postgres (sharded)Works to a point. At 600k read QPS you need read replicas + Citus. Operationally heavier.
Redis (sole store)Too expensive at 90 TB and not durable enough for source of truth.

Pick: DynamoDB (or Cassandra if you self-host). Hash-partition by short_code.

Schema:

short_code  STRING  PK
long_url    STRING
owner_id    STRING
created_at  TIMESTAMP
expires_at  TIMESTAMP
meta        MAP<STRING,STRING>      -- UTM, custom tags

Add a secondary index on (owner_id, created_at) only if you need owner-scoped listings — they're not on the hot path.

5. Write path

1. Validate URL (length ≤ 2048, scheme http/https, blocklist).
2. Idempotency check: if hash(long_url, owner) already exists → return it.
3. Pop next short_code from local batch (refill via INCRBY 10000 on Redis).
4. Conditional put to DynamoDB (so we catch the very rare race).
5. Async publish to Kafka — analytics consumer aggregates later.
6. Return 201 with short URL.

Two failure modes to think about now:

  • Counter shard dies. Batched IDs in app caches keep writes alive for minutes. Alert, re-elect.
  • Conditional put fails. Hash collision (rare with 64-bit codes). Increment local counter and retry.

6. Owner / multi-tenant data model

If you want custom aliases (https://tinyurl.com/dinesh-blog) you need:

  • Uniqueness across the global namespace.
  • A reserved-words blocklist (admin, api, login, etc.).
  • Per-owner quota tracking (count + storage).
  • A second logical table aliases (alias_str PK, short_code, owner_id) so resolution still goes through the same KV path.

7. Putting it on the map

┌───────┐     ┌───────────┐
│ User  │────▶│ Load bal. │ HTTPS
└───────┘     └────┬──────┘
                   │
            ┌──────▼──────┐
            │  Shorten    │  POST /api/shorten
            │  service    │  (validates, picks id)
            └──────┬──────┘
                   │   put
            ┌──────▼──────┐    INCRBY 10000
            │  DynamoDB   │◀───── Redis counter shard
            └──────┬──────┘
                   │ async event
            ┌──────▼──────┐
            │   Kafka     │ → analytics consumer
            └─────────────┘

Read path lives in Part 2 (Session 8) — that's where cache, CDN, hot-key handling, and stale-while-revalidate go. Putting both in one session is what made the 28-day plan feel rushed.

8. Numbers you should be able to defend

  • Why 6–7 char codes? log₆₂(180B / collision-safety-factor) ≈ 6.5 — pick 7 for headroom.
  • Why DynamoDB over Postgres? At 600k read QPS, sharding Postgres is a job in itself. DynamoDB charges by request and storage — predictable.
  • Why batched IDs? At 6k peak write QPS with a 10k batch, you refill once every ≈ 1.7 s per app server. Counter QPS = app_servers / 1.7.
  • Why conditional put? Race window is microseconds, but at 100M/day even microsecond races happen daily.

9. What we're saving for Part 2 (Session 8)

  • Cache layer (Redis cluster), 95% hit-rate math at Zipf 0.9
  • CDN edge — doing the 302 at the edge for read-heavy keys
  • Hot-key problem — viral short codes
  • Abuse / safe-browsing integration
  • Analytics — Kafka → Flink → ClickHouse
  • Multi-region trade-offs

Reading material

Books:

  • System Design Interview Vol. 1 — Alex Xu (ch. 8: Design a URL Shortener)
  • Designing Data-Intensive Applications — Martin Kleppmann (chs. 2, 3, 6 for storage & indexing)
  • Database Internals — Alex Petrov (ch. on B-trees vs LSM trees — why Postgres vs Cassandra for the mapping table)

Papers:

Official docs:

Blog posts:

In-depth research material

Videos

LeetCode — Encode And Decode Tinyurl

Post-session checklist

By the end of this session you should be able to:

  • State the back-of-envelope numbers (QPS write, QPS read, storage) for the spec.
  • Pick an ID scheme and defend the choice; name what you sacrificed.
  • Defend the choice of KV store over RDBMS at this scale.
  • Sketch the write path including idempotency and async analytics.
  • Solve the LeetCode encode-and-decode-tinyurl problem using your chosen scheme.

Generated from sessions_data.py + content_part*.py. To edit a video / leetcode / title, edit the data file and re-run write_sessions.py.