Designing a Chat System — Connections, Fanout, Storage, Delivery
Session 22 of the 48-session learning series.
Why this session matters
This is Session 22 of 48 in the System Design track. It builds on the rhythm of one focused topic, paced so you have time to actually absorb it rather than rush.
Agenda
- Requirements — 1-1 chat, groups, presence, delivery receipts, history
- Connection layer — long-poll vs WebSocket vs MQTT; sharding connections
- Routing & fanout — chat servers, pub/sub, per-chat partitions
- Storage — messages, conversation index, attachments, read receipts
- Delivery semantics — exactly-once-ish, ordering, offline queue
Pre-read (skim before the session)
- WhatsApp Engineering — Scaling Erlang
- ByteByteGo — Designing a Chat System
- Discord — How Discord stores billions of messages
- Signal Protocol (Open Whisper Systems)
Deep dive
1. Sketching requirements
A "chat system" can mean many products. Typical interview / product scope:
- 1-1 chat and group chat (≤500 members).
- Presence (online / typing / last seen).
- Read receipts.
- History (search, pagination back N months).
- Multi-device sync.
- Push notifications when offline.
- Optional: voice/video, e2e encryption, attachments.
Scale to design for: 100 M DAU, 100 K msgs/sec peak, 10:1 read:write, 5-second p99 delivery.
2. The core tension
Chat sits at the intersection of a persistent connection layer (stateful — you must know where a user is connected) and durable storage (stateless — distribute and replicate).
The clean separation:
[ Client ] ──┐
├── WebSocket ──▶ [ Connection Server ]
[ Client ] ──┘ │
│ pub/sub
▼
[ Chat Service ] ←──▶ [ Storage (KV / wide-col) ]
│
▼
[ Push Service ] (APNs/FCM when offline)
3. Connection layer
Long polling — client opens HTTP; server holds until message or timeout (30 s). Works through every firewall. High overhead per message.
WebSocket — full-duplex; one TCP connection per device. Industry standard since 2014. ~100K connections per modest server.
MQTT — built for lossy networks (cellular). What WhatsApp uses on the wire; lighter than WebSocket for mobile.
HTTP/3 + WebTransport — newer; still rolling out.
For interviews: pick WebSocket. Mention MQTT for mobile.
4. Connection sharding
You'll have millions of connections. Distribute across thousands of connection servers. Each connection registered in a presence service:
presence: user_id → (connection_server, session_id)
Lookup: O(1) hash. Storage: Redis cluster or Bigtable-like KV.
When user A sends to user B, the chat service asks presence "where is B?" and pushes to that connection server.
5. Routing the message
1. Client A → Conn Server X: "send msg to B in chat C"
2. Conn X → Chat Service: persist + validate
3. Chat Service → Storage: append to messages(chat=C)
4. Chat Service → Pub/Sub: publish (chat=C, msg)
5. All Conn Servers subscribed to chat=C receive
6. Each looks up its local subscribers → push to them
Two pub/sub topologies:
- Per-chat topics. Million chats = million topics. Brokers don't love this.
- Per-connection-server fanout. Chat Service knows "chat C has members [A, D, E]"; looks up presence for each; sends individual nudges to relevant Conn Servers.
WhatsApp / Slack lean toward the second — looks up members per send.
6. Storage — message layout
A wide-column store (Cassandra, ScyllaDB, Bigtable, DynamoDB):
PK = (chat_id, time_bucket)
CK = (timestamp, msg_id)
columns: sender_id, body, attachments[], delivery, read_by[]
time_bucket (e.g., yearmonth) keeps partitions bounded. Within a bucket, sorted by timestamp → range scans for "load last 50 msgs" are local.
Discord: messages keyed by (channel_id, message_id) where message_id is a Snowflake encoding timestamp. They rebuilt their entire storage layer twice as they grew — first Cassandra, then ScyllaDB.
7. Conversation index
Per user, you need "list my chats with last message + unread count":
PK = user_id
CK = last_activity_ts desc
columns: chat_id, last_msg_preview, unread_count
Updated on every send/receive. Hot key risk for very active chats; mitigate with write batching.
8. Delivery semantics
Chat is at-least-once with dedup on the client. Server assigns a global message id (Snowflake-like). Client dedupes by id when sync'ing across devices.
Ordering: per-chat, monotonically increasing message id. Multi-device: each device tracks its last-seen id; on connect, requests "give me everything > last_seen".
9. Read receipts and presence
Read receipt = update to read_by set for a message. High write amplification (every read in a 500-member group = 500 set updates). Batch + debounce.
Presence: ephemeral. Don't persist; expire from Redis after 30 s of no heartbeat. Online = there exists a recent heartbeat.
Typing: even more ephemeral; pub/sub broadcast only, never stored.
10. Offline queue and push
When a recipient is offline:
- Persist the message (already done in step 3 above).
- Increment unread count for that user.
- Send a push notification via APNs / FCM.
On reconnect: client requests "msgs > last_seen". Walks chat by chat.
11. Group chat scaling
500-member group, one message:
- 1 storage write.
- ~500 presence lookups.
- Up to 500 connection pushes.
Easy at 1 message/sec; brutal at 100. Solutions:
- Mute / archive unread updates from inactive members.
- Local server-side fanout — Conn Server X knows "I have 100 members of this chat connected" → one publish, local fanout.
- Server-relayed history for huge channels (Slack, Discord): clients pull on demand instead of receiving every msg in real-time.
12. End-to-end encryption (Signal protocol)
Out of scope for the interview happy path, but you should mention:
- Double Ratchet — forward secrecy + post-compromise security.
- Server never sees plaintext; metadata only (who-to-whom, when, size).
- Multi-device sync requires per-device keys.
E2E is a heavy product decision — it breaks server-side search, content moderation, and some compliance.
13. Things to bring up unprompted
- Snowflake-style IDs for messages (timestamp + worker + seq).
- Idempotency keys on the client side (avoid double-send on retry).
- Backpressure — if a Conn Server is overloaded, shed messages to its clients (with a "reconnect for history" hint).
- Multi-region — geo-route users to nearest region; replicate chats async across regions.
- Observability — p99 delivery latency, undelivered rate, fanout cost.
Reading material
Books:
- System Design Interview Volume 2 — Alex Xu (the chat system chapter is the canonical interview answer)
- Designing Data-Intensive Applications — Martin Kleppmann (chs. on messaging, exactly-once, encryption)
- Building Microservices, 2nd ed. — Sam Newman (the asynchronous communication chapters)
Papers:
- The Double Ratchet Algorithm — Marlinspike & Perrin (Signal, 2016) — the E2EE protocol behind Signal/WhatsApp.
- Bayou: A Weakly Connected Replicated Storage System (SOSP 1995) — the academic roots of offline-first messaging.
- Cassandra: A Decentralized Structured Storage System — Lakshman & Malik 2010 — the storage layer Discord and Instagram messaging used.
Official docs:
- WebSocket Protocol RFC 6455 — the wire-level protocol behind real-time chat.
- XMPP Core RFC 6120 — the original IM protocol; still powers HipChat/Smack.
- MQTT 5.0 spec — what WhatsApp's mobile transport is based on.
- Signal Protocol docs — X3DH, Double Ratchet, sealed sender.
Blog posts:
- WhatsApp Scaling — Rick Reed (Erlang Factory) — millions of connections per Erlang node.
- Discord — How Discord Stores Trillions of Messages — ScyllaDB migration deep-dive.
- Slack — Real-time messaging at scale — the WebSocket gateway architecture.
- Snowflake IDs — Twitter Engineering — the sortable distributed ID generator used in every chat system.
In-depth research material
- Signal-iOS — github.com/signalapp/Signal-iOS — ~11k ★, the reference E2EE messenger implementation.
- Matrix — github.com/matrix-org — federated open-source chat protocol; rich design docs.
- Rocket.Chat — github.com/RocketChat/Rocket.Chat — ~41k ★, the open-source Slack alternative.
- Discord — How Discord Stores Billions of Messages (2017) — the Cassandra-era post.
- Discord — How Discord Handles Two and Half Million Concurrent Voice Users using WebRTC
- LinkedIn — Building real-time messaging at scale — push/pull hybrid for mobile.
- Uber — Real-time push platform — RAMEN protocol, mobile-first.
- ByteByteGo — Design WhatsApp/Messenger — the cleanest written walk-through.
- WebRTC overview — webrtc.org — P2P video/voice/data channels.
- Slack — Flannel: An application-level edge cache — user/channel metadata caching.
Videos
- System Design — Chat / WhatsApp / Messenger — ByteByteGo · 14 min — the interview-shaped walk-through; great anchor.
- Design a Chat System like WhatsApp — Gaurav Sen — Gaurav Sen · 28 min — the classic system design YouTube walkthrough.
- Discord's Trillion-Message Database — ScyllaDB Summit — 30 min — the real story behind the migration; talk by the Discord engineers.
- WhatsApp — 1 server, 2 million connections — Rick Reed — 38 min — the Erlang Factory talk; the legendary FreeBSD + Erlang stack.
- How WebSockets Work — Hussein Nasser — 30 min — the transport detail nobody explains; useful before designing the chat gateway.
LeetCode — Design Chat System
- Link: https://leetcode.com/problems/design-chat-system/
- Difficulty: Medium
- Why this problem: Use Map<userId, conn> + Kafka-like fanout queue per chat; persist on delivery ack.
- Time-box: 30 minutes. Look up the editorial only after.
Post-session checklist
By the end of this session you should be able to:
- Sketch the boxes-and-arrows of a chat system in 5 minutes.
- Explain WebSocket vs MQTT trade-offs for mobile clients.
- Design the message storage schema and conversation index.
- Describe presence: where it lives, expiry, lookup cost.
- Handle a 500-member group msg without melting the brokers.
- Solve the design problem (heap-merge timelines) — chat fanout is structurally similar.
Generated from sessions_data.py + content_part*.py. To edit a video / leetcode / title, edit the data file and re-run write_sessions.py.