Day 14 — Sharding, Replication & Multi-Region Databases
The moment one database can't hold your data, you shard. The moment one region can't serve your users, you go multi-region. Both decisions cascade into every ot…
Sharding distributes data across nodes by partition key. Replication keeps copies across nodes/regions. Together they buy you scale and availability — but every choice locks in trade-offs that are painful to reverse.
🧠 Concept
Why it matters & the mental model.
1. Sharding strategies
- Range: partition by sorted key range (e.g. user_id 0-999, 1000-1999). Great for range scans, hot spots if data is skewed.
- Hash: partition by
hash(key) % N. Even distribution, but range scans become scatter-gather. - Directory / consistent hashing: hash to a virtual node, virtual nodes mapped to physical nodes by a lookup table → can rebalance without rehashing the world (Dynamo, Cassandra, Riak).
- Geographic: shard by region (data lives near users) → low latency, complex cross-region queries.
2. The shard key — your single biggest decision
A good shard key:
- High cardinality (millions+ of distinct values).
- Even distribution of access (no celebrity keys).
- Common in queries (so most queries hit one shard).
- Stable (doesn't change for a row, else moves between shards).
User_id is great for B2C. For B2B, tenant_id often beats user_id because cross-user queries within a tenant are common.
3. Replication topologies
- Single leader: simple, linearisable writes via leader. Most RDBMS, MongoDB.
- Multi-leader: writes anywhere, async replication, conflicts to resolve. Used in multi-region setups when sync replication latency is unacceptable.
- Leaderless (Dynamo, Cassandra, Riak): any replica accepts writes, R+W>N quorums, anti-entropy + read repair to converge.
🛠 Deep Dive
Internals, code, architecture.
4. Sync vs async, region-by-region
- Sync within region (sub-ms): fine — Spanner does this with Paxos in each region.
- Sync across regions (50-200 ms RTT): every write pays that latency. Only acceptable for low-write workloads (configuration, leases).
- Async cross-region: cheap but means RPO > 0 (you can lose recent writes if the region dies).
5. Multi-region patterns
- Active-passive: one write region, others read-only / standby. Simple, failover is a deliberate event.
- Active-active region-pinned: each user's data has a "home region" (sharded by region). Writes always local. Cross-region writes need 2-phase coordination.
- Active-active globally consistent (Spanner, CockroachDB, FaunaDB): synchronous global writes via Paxos + TrueTime / HLC. Expensive, beautiful.
6. Resharding — the dragon
Adding shards when you outgrow N is painful with hash-mod sharding (almost every key moves). Mitigations:
- Consistent hashing + virtual nodes: only 1/N keys move.
- Logical → physical indirection: many logical shards mapped to few physical; split a physical when hot.
- Vitess: online split with double-write + cutover.
🚀 In Practice
Trade-offs, exercises, what to ship today.
7. Cross-shard queries
The dirty secret: most apps eventually need joins across shards. Options:
- Denormalise at write time (Cassandra style).
- Scatter-gather from app: slow, but explicit.
- Fan-out via async pipeline (CDC → search index) for analytic queries.
- Distributed SQL (CockroachDB, Yugabyte, TiDB): join across shards transparently with cost.
8. Consistency knobs per query
Per-query consistency is a feature: SELECT … READ COMMITTED vs SELECT … FOR UPDATE, Cassandra consistency_level=ONE/QUORUM/ALL, Dynamo ConsistentRead=True. Use linearisable reads only where required (balance, auth); use cheaper reads everywhere else.
9. Disaster recovery
Define RPO (data loss budget, e.g. 30 s) and RTO (downtime budget, e.g. 5 min). They drive replication mode and rehearsal frequency. A DR plan you've never tested doesn't work.
10. What to take away
Always state shard key, replication mode, write region routing, and consistency per use case. The senior tell: name a real failure mode ("if region A is split-brained from B, leader election may make A think it's still leader for 30s — we accept that RPO").
Resources
- 🎥 ByteByteGo — Database Sharding Explained
- 📖 DDIA — Ch. 5 (Replication) + Ch. 6 (Partitioning)
- 📖 Vitess — How sharding actually works
- 📖 Spanner whitepaper
Practice Problem: Design Twitter (Medium)