Day 09 — CAP, PACELC, Consensus — Raft, Quorums, and Realistic Trade-offs
Every distributed system has to make a CAP-style call. Understanding Raft / Paxos and quorum reads/writes is what lets engineers reason precisely instead of wav…
CAP gets all the airtime but is the least useful version of the trade-off. PACELC is the one you actually need at design time: if Partition then Availability vs Consistency, Else Latency vs Consistency. Most production systems live in the "Else" branch 99.99% of the time.
🧠 Concept
Why it matters & the mental model.
1. Definitions, precisely
- Consistency (CAP-C, linearizability): every read sees the most recent write — as if there were a single copy.
- Availability: every request gets a non-error response (no waiting for a quorum that's gone).
- Partition tolerance: system keeps functioning despite network splits. You always have P in practice → so under partition you must pick A or C.
2. PACELC examples
The choice between consistency and availability shows up in real databases as concrete defaults.
3. Raft — consensus you can actually understand
Raft elects a single leader; all writes go through it. Leader appends to its log and replicates to followers; an entry is committed once a majority have stored it.
Key invariants:
- Election safety: at most one leader per term.
- Log matching: if two logs share an entry at the same index+term, all prior entries match.
- State machine safety: once a log entry is applied, no other entry will ever be applied at that index.
4. Leader election
Each node has a term (monotonic). On timeout (~150-300 ms randomised), a follower becomes candidate, increments term, requests votes. A node votes once per term, only if the candidate's log is at least as up-to-date. Candidate with majority votes → leader. Randomised timeouts kill split votes.
🛠 Deep Dive
Internals, code, architecture.
5. Quorums and the magic of N/2+1
Why majority? Two quorums of size > N/2 must intersect → at least one node has the latest committed write. R+W>N (Dynamo-style) is the generalisation: pick R, W so any read and write quorum overlap.
- R=W=N/2+1: linearisable, balanced.
- W=N, R=1: fast reads, fragile writes.
- W=1, R=N: opposite.
6. Paxos, Multi-Paxos, EPaxos, Zab
- Basic Paxos: prove one value chosen; correct but legendarily hard to read.
- Multi-Paxos: stream of values, with a stable leader (essentially Raft minus pedagogy).
- EPaxos: leaderless, commutative writes commit in 1 RTT — used in research / CockroachDB-like systems.
- Zab: ZooKeeper's protocol, very similar to Raft.
7. Snapshots & log compaction
Logs grow forever. Periodically take a state-machine snapshot, truncate log behind it, ship snapshot to lagging followers. Trade-off: snapshot cost vs replay cost on restart.
8. Read paths
Three options with different cost/consistency:
- Leader read with leadership lease: linearisable, no quorum round trip per read.
- Read index: leader asks majority "am I still leader?" then serves; linearisable, one RTT.
- Follower read with bounded staleness: cheaper, returns "x as of timestamp T".
🚀 In Practice
Trade-offs, exercises, what to ship today.
9. Real-world wrinkles
- Stop-the-world GC can make a leader miss heartbeats → spurious election. Tune GC or use languages without it for critical paths.
- Network blips (BGP flaps, k8s overlay) trigger elections; watch
leader_changes_total. - Cross-region majority quorums add inter-region RTT to every write — sometimes you accept it (Spanner), sometimes you partition by region (single-region writes, async replication).
10. Beyond consensus — when you don't need it
Not every write needs linearisability. CRDTs (counters, sets, maps with merge-commutative ops) give eventual consistency without consensus, used in Redis Enterprise, Riak, collaborative editors (Figma, Notion). Pick CRDTs when offline + multi-writer is required.
11. Picking C or A — decision rubric
- Money / inventory / auth tokens → C (Spanner, etcd, RDBMS with sync replication).
- Likes / view counts / session tokens → A (Dynamo / Cassandra with eventual reads).
- Mixed → split your data model; small consistent core + large eventually-consistent periphery.
12. What to take away
"How do you handle a split brain?" Strong answers name the protocol (Raft/Paxos), explain quorum intersection, mention leases for fast reads, and acknowledge GC / clock skew as real-world traps. Bonus: differentiate CAP from PACELC.
Resources
- 🎥 MIT 6.824 — Raft (Robert Morris)
- 📖 In Search of an Understandable Consensus Algorithm (Raft paper)
- 📖 Raft visualization (interactive)
- 📖 Designing Data-Intensive Apps — Ch. 9 (Consistency & Consensus)
Practice Problem: Number of Islands (Medium)