system designintermediate 12m2026-06-09

URL Shortener Part 1 — Numbers, IDs, Storage

Session 3 of the 48-session learning series.

Why this session matters

This is Session 03 of 48 in the System Design track. It builds on the rhythm of one focused topic, paced so you have time to actually absorb it rather than rush.

Agenda

Back-of-envelope at 100M writes/day, 10B reads/day
Three ID schemes — hash, counter+base62, Snowflake — trade-offs
Choosing the storage engine — KV store vs SQL vs NoSQL
Schema design and idempotency on the write path
Why the cache + CDN discussion is its own session (Part 2)

Pre-read (skim before the session)

Deep dive

1. The shape of the problem

TinyURL looks trivial until you put numbers on it. Design for 100M new URLs/day, 10B reads/day, 5-year retention and let the layers fall out.

2. Back-of-envelope (always do this first)

Writes: 100M / 86,400 s ≈ 1.2k QPS average, 6k QPS at 5× peak.
Reads: 10B / 86,400 s ≈ 115k QPS average, 600k QPS at peak. R:W ≈ 100:1.
Storage: 100M × 365 × 5 ≈ 180 B records. Each record ≈ 500 B (short, long, owner, created_at, expires_at, click_count) → 90 TB. Plus 2–3× replication.

State these numbers out loud before you talk about anything else. They drive every subsequent decision.

3. ID / short-code scheme — three real options

Option A — Hash(long_url)[:7] in base62

✅ Deterministic, idempotent (same long → same short).
❌ Collisions need a check-and-retry against an index. Extra round trip.
❌ Doesn't help if two users want different shorts for the same URL.

Option B — Auto-increment counter + base62

✅ No collisions ever. Shortest codes (6–7 chars for 100B records: log₆₂(1e11) ≈ 6.1).
❌ Single counter = single point of contention.
✅ Mitigate with batch allocation: each app server reserves 10k IDs at once via INCRBY counter 10000 on Redis. Local cache, lock-free use, refill when low.

Option C — Snowflake (64-bit)

41 bits timestamp | 10 bits machine_id | 12 bits sequence → coordination-free, sortable by time.
Base62 of 64 bits = 11 chars (too long). Truncate to 48 bits → 8 chars.
✅ No coordination required across machines.
❌ Codes leak creation time and machine_id (mild info leak).

Pick: For 100M/day I'd choose counter with batched allocation (Option B). Simplest, shortest codes, contention solved by batching. State the alternative and why you rejected it.

4. Storage — the access pattern is point-lookup by short code

This is textbook KV: O(1) lookup by single key, no scans.

Choice	Fit
DynamoDB / Cassandra	Hash-partitioned by short_code. Horizontal scale. 90 TB is fine.
Bigtable	Same model, GCP-flavoured.
Postgres (sharded)	Works to a point. At 600k read QPS you need read replicas + Citus. Operationally heavier.
Redis (sole store)	Too expensive at 90 TB and not durable enough for source of truth.

Pick: DynamoDB (or Cassandra if you self-host). Hash-partition by short_code.

Schema:

short_code  STRING  PK
long_url    STRING
owner_id    STRING
created_at  TIMESTAMP
expires_at  TIMESTAMP
meta        MAP<STRING,STRING>      -- UTM, custom tags

Add a secondary index on (owner_id, created_at) only if you need owner-scoped listings — they're not on the hot path.

5. Write path

1. Validate URL (length ≤ 2048, scheme http/https, blocklist).
2. Idempotency check: if hash(long_url, owner) already exists → return it.
3. Pop next short_code from local batch (refill via INCRBY 10000 on Redis).
4. Conditional put to DynamoDB (so we catch the very rare race).
5. Async publish to Kafka — analytics consumer aggregates later.
6. Return 201 with short URL.

Two failure modes to think about now:

Counter shard dies. Batched IDs in app caches keep writes alive for minutes. Alert, re-elect.
Conditional put fails. Hash collision (rare with 64-bit codes). Increment local counter and retry.

6. Owner / multi-tenant data model

If you want custom aliases (https://tinyurl.com/dinesh-blog) you need:

Uniqueness across the global namespace.
A reserved-words blocklist (admin, api, login, etc.).
Per-owner quota tracking (count + storage).
A second logical table aliases (alias_str PK, short_code, owner_id) so resolution still goes through the same KV path.

7. Putting it on the map

┌───────┐     ┌───────────┐
│ User  │────▶│ Load bal. │ HTTPS
└───────┘     └────┬──────┘
                   │
            ┌──────▼──────┐
            │  Shorten    │  POST /api/shorten
            │  service    │  (validates, picks id)
            └──────┬──────┘
                   │   put
            ┌──────▼──────┐    INCRBY 10000
            │  DynamoDB   │◀───── Redis counter shard
            └──────┬──────┘
                   │ async event
            ┌──────▼──────┐
            │   Kafka     │ → analytics consumer
            └─────────────┘

Read path lives in Part 2 (Session 8) — that's where cache, CDN, hot-key handling, and stale-while-revalidate go. Putting both in one session is what made the 28-day plan feel rushed.

8. Numbers you should be able to defend

Why 6–7 char codes? log₆₂(180B / collision-safety-factor) ≈ 6.5 — pick 7 for headroom.
Why DynamoDB over Postgres? At 600k read QPS, sharding Postgres is a job in itself. DynamoDB charges by request and storage — predictable.
Why batched IDs? At 6k peak write QPS with a 10k batch, you refill once every ≈ 1.7 s per app server. Counter QPS = app_servers / 1.7.
Why conditional put? Race window is microseconds, but at 100M/day even microsecond races happen daily.

9. What we're saving for Part 2 (Session 8)

Cache layer (Redis cluster), 95% hit-rate math at Zipf 0.9
CDN edge — doing the 302 at the edge for read-heavy keys
Hot-key problem — viral short codes
Abuse / safe-browsing integration
Analytics — Kafka → Flink → ClickHouse
Multi-region trade-offs

Reading material

Books:

System Design Interview Vol. 1 — Alex Xu (ch. 8: Design a URL Shortener)
Designing Data-Intensive Applications — Martin Kleppmann (chs. 2, 3, 6 for storage & indexing)
Database Internals — Alex Petrov (ch. on B-trees vs LSM trees — why Postgres vs Cassandra for the mapping table)

Papers:

Dynamo: Amazon's Highly Available Key-value Store (SOSP 2007) — the lookup pattern at the heart of any KV-backed shortener.
Bigtable: A Distributed Storage System for Structured Data (OSDI 2006) — wide-column design alternative.

Official docs:

Blog posts:

How Bitly works (engineering) — bitly's own writeups.
Hashids: short, unique, non-sequential IDs — the encoding scheme used by half the industry.
Designing a URL Shortener — Code Karle — clean writeup with capacity math.

In-depth research material

Twitter Snowflake — github.com/twitter-archive/snowflake — the 64-bit ID generator everyone copies.
Sony's sonyflake — github.com/sony/sonyflake — Snowflake-style with longer machine lifetime.
Instagram Engineering — Sharding & IDs — how Instagram generates IDs across shards.
Discord — How Discord Stores Billions of Messages — same KV-lookup pattern at scale.
system-design-primer — github.com/donnemartin/system-design-primer — ~270k ★, includes URL shortener exercises.
Stripe Engineering — Designing APIs for humans (idempotency keys) — relevant for write idempotency on POST /shorten.

Videos

Beginner System Design Interview: Design Bitly w/ a Ex-Meta Staff Engineer — Hello Interview · 59 min — the most thorough modern walkthrough; covers numbers, capacity, ID schemes, KV vs SQL.
Design a URL Shortener (Bitly) — System Design Interview — NeetCodeIO · 48 min — clean, interview-paced, with whiteboard diagrams.
Tiny URL — System Design Interview Question — TechPrep · 9 min — quick capacity-math refresher; useful as a primer.
How Does a URL Shortener Work? — ByteByteGo · 6 min — animated overview from Alex Xu's channel.
TinyURL System Design — codeKarle — codeKarle · 24 min — different framing (capacity → API → DB schema → cache); a useful second perspective.

LeetCode — Encode And Decode Tinyurl

Link: https://leetcode.com/problems/encode-and-decode-tinyurl/
Difficulty: Medium
Why this problem: Counter + base62 keeps codes short; hash-based adds idempotency at cost of collisions.
Time-box: 30 minutes. Look up the editorial only after.

Post-session checklist

By the end of this session you should be able to:

State the back-of-envelope numbers (QPS write, QPS read, storage) for the spec.
Pick an ID scheme and defend the choice; name what you sacrificed.
Defend the choice of KV store over RDBMS at this scale.
Sketch the write path including idempotency and async analytics.
Solve the LeetCode encode-and-decode-tinyurl problem using your chosen scheme.

Generated from sessions_data.py + content_part*.py. To edit a video / leetcode / title, edit the data file and re-run write_sessions.py.

← previous

Spark Part 1 — Driver, Executors, RDDs, Lazy Evaluation

SOLID Part 1 — SRP, OCP, LSP with Python Examples