Search Tech Journey

Find topics, journeys and posts

back to blog
system designintermediate 12m2026-06-09

Designing a Chat System — Connections, Fanout, Storage, Delivery

Session 22 of the 48-session learning series.

Why this session matters

This is Session 22 of 48 in the System Design track. It builds on the rhythm of one focused topic, paced so you have time to actually absorb it rather than rush.

Agenda

  • Requirements — 1-1 chat, groups, presence, delivery receipts, history
  • Connection layer — long-poll vs WebSocket vs MQTT; sharding connections
  • Routing & fanout — chat servers, pub/sub, per-chat partitions
  • Storage — messages, conversation index, attachments, read receipts
  • Delivery semantics — exactly-once-ish, ordering, offline queue

Pre-read (skim before the session)

Deep dive

1. Sketching requirements

A "chat system" can mean many products. Typical interview / product scope:

  • 1-1 chat and group chat (≤500 members).
  • Presence (online / typing / last seen).
  • Read receipts.
  • History (search, pagination back N months).
  • Multi-device sync.
  • Push notifications when offline.
  • Optional: voice/video, e2e encryption, attachments.

Scale to design for: 100 M DAU, 100 K msgs/sec peak, 10:1 read:write, 5-second p99 delivery.

2. The core tension

Chat sits at the intersection of a persistent connection layer (stateful — you must know where a user is connected) and durable storage (stateless — distribute and replicate).

The clean separation:

[ Client ] ──┐
             ├── WebSocket ──▶ [ Connection Server ]
[ Client ] ──┘                        │
                                       │ pub/sub
                                       ▼
                                 [ Chat Service ]  ←──▶  [ Storage (KV / wide-col) ]
                                       │
                                       ▼
                                 [ Push Service ]   (APNs/FCM when offline)

3. Connection layer

Long polling — client opens HTTP; server holds until message or timeout (30 s). Works through every firewall. High overhead per message.

WebSocket — full-duplex; one TCP connection per device. Industry standard since 2014. ~100K connections per modest server.

MQTT — built for lossy networks (cellular). What WhatsApp uses on the wire; lighter than WebSocket for mobile.

HTTP/3 + WebTransport — newer; still rolling out.

For interviews: pick WebSocket. Mention MQTT for mobile.

4. Connection sharding

You'll have millions of connections. Distribute across thousands of connection servers. Each connection registered in a presence service:

presence: user_id → (connection_server, session_id)

Lookup: O(1) hash. Storage: Redis cluster or Bigtable-like KV.

When user A sends to user B, the chat service asks presence "where is B?" and pushes to that connection server.

5. Routing the message

1. Client A → Conn Server X: "send msg to B in chat C"
2. Conn X → Chat Service: persist + validate
3. Chat Service → Storage: append to messages(chat=C)
4. Chat Service → Pub/Sub: publish (chat=C, msg)
5. All Conn Servers subscribed to chat=C receive
6. Each looks up its local subscribers → push to them

Two pub/sub topologies:

  • Per-chat topics. Million chats = million topics. Brokers don't love this.
  • Per-connection-server fanout. Chat Service knows "chat C has members [A, D, E]"; looks up presence for each; sends individual nudges to relevant Conn Servers.

WhatsApp / Slack lean toward the second — looks up members per send.

6. Storage — message layout

A wide-column store (Cassandra, ScyllaDB, Bigtable, DynamoDB):

PK = (chat_id, time_bucket)
CK = (timestamp, msg_id)
columns: sender_id, body, attachments[], delivery, read_by[]

time_bucket (e.g., yearmonth) keeps partitions bounded. Within a bucket, sorted by timestamp → range scans for "load last 50 msgs" are local.

Discord: messages keyed by (channel_id, message_id) where message_id is a Snowflake encoding timestamp. They rebuilt their entire storage layer twice as they grew — first Cassandra, then ScyllaDB.

7. Conversation index

Per user, you need "list my chats with last message + unread count":

PK = user_id
CK = last_activity_ts desc
columns: chat_id, last_msg_preview, unread_count

Updated on every send/receive. Hot key risk for very active chats; mitigate with write batching.

8. Delivery semantics

Chat is at-least-once with dedup on the client. Server assigns a global message id (Snowflake-like). Client dedupes by id when sync'ing across devices.

Ordering: per-chat, monotonically increasing message id. Multi-device: each device tracks its last-seen id; on connect, requests "give me everything > last_seen".

9. Read receipts and presence

Read receipt = update to read_by set for a message. High write amplification (every read in a 500-member group = 500 set updates). Batch + debounce.

Presence: ephemeral. Don't persist; expire from Redis after 30 s of no heartbeat. Online = there exists a recent heartbeat.

Typing: even more ephemeral; pub/sub broadcast only, never stored.

10. Offline queue and push

When a recipient is offline:

  1. Persist the message (already done in step 3 above).
  2. Increment unread count for that user.
  3. Send a push notification via APNs / FCM.

On reconnect: client requests "msgs > last_seen". Walks chat by chat.

11. Group chat scaling

500-member group, one message:

  • 1 storage write.
  • ~500 presence lookups.
  • Up to 500 connection pushes.

Easy at 1 message/sec; brutal at 100. Solutions:

  • Mute / archive unread updates from inactive members.
  • Local server-side fanout — Conn Server X knows "I have 100 members of this chat connected" → one publish, local fanout.
  • Server-relayed history for huge channels (Slack, Discord): clients pull on demand instead of receiving every msg in real-time.

12. End-to-end encryption (Signal protocol)

Out of scope for the interview happy path, but you should mention:

  • Double Ratchet — forward secrecy + post-compromise security.
  • Server never sees plaintext; metadata only (who-to-whom, when, size).
  • Multi-device sync requires per-device keys.

E2E is a heavy product decision — it breaks server-side search, content moderation, and some compliance.

13. Things to bring up unprompted

  • Snowflake-style IDs for messages (timestamp + worker + seq).
  • Idempotency keys on the client side (avoid double-send on retry).
  • Backpressure — if a Conn Server is overloaded, shed messages to its clients (with a "reconnect for history" hint).
  • Multi-region — geo-route users to nearest region; replicate chats async across regions.
  • Observability — p99 delivery latency, undelivered rate, fanout cost.

Reading material

Books:

  • System Design Interview Volume 2 — Alex Xu (the chat system chapter is the canonical interview answer)
  • Designing Data-Intensive Applications — Martin Kleppmann (chs. on messaging, exactly-once, encryption)
  • Building Microservices, 2nd ed. — Sam Newman (the asynchronous communication chapters)

Papers:

Official docs:

Blog posts:

In-depth research material

Videos

LeetCode — Design Chat System

Post-session checklist

By the end of this session you should be able to:

  • Sketch the boxes-and-arrows of a chat system in 5 minutes.
  • Explain WebSocket vs MQTT trade-offs for mobile clients.
  • Design the message storage schema and conversation index.
  • Describe presence: where it lives, expiry, lookup cost.
  • Handle a 500-member group msg without melting the brokers.
  • Solve the design problem (heap-merge timelines) — chat fanout is structurally similar.

Generated from sessions_data.py + content_part*.py. To edit a video / leetcode / title, edit the data file and re-run write_sessions.py.