API Design — REST, GraphQL, gRPC, Versioning, Pagination, Errors
Session 31 of the 48-session learning series.
Date: Sat, 2026-07-04 · Time: 09:00–11:00 IST · Track: 🏗️ System Design (SYS) · Parent 28-day topic: Day 27 · Est. read: 2 h
Why this session matters
This is Session 31 of 48 in the System Design track. An API is the public face of your system. Bad APIs are renegotiated for years; good ones outlive the teams that built them. Knowing where REST shines, where GraphQL pays off, and where gRPC dominates is table-stakes for any senior engineer.
Agenda
- REST done well — resources, HTTP verbs, status codes, HATEOAS (or not)
- GraphQL — the schema, query, mutation, subscription model; n+1 trap
- gRPC + Protobuf — when binary + streaming wins
- API versioning, deprecation, pagination, errors
- Idempotency, rate limiting, auth — production hygiene
Pre-read (skim before the session)
- Roy Fielding — Architectural Styles (PhD thesis, 2000) — Ch. 5
- JSON:API spec
- Google API Design Guide
- GraphQL Best Practices
Deep dive
1. The 3 dominant API styles in 2026
| Style | Format | Schema | Streaming | Best at |
|---|---|---|---|---|
| REST | JSON over HTTP | Optional (OpenAPI) | Limited (SSE) | Public, cacheable resource APIs |
| GraphQL | JSON over HTTP | Mandatory (SDL) | Subscriptions | Mobile/web with varying needs |
| gRPC | Protobuf over HTTP/2 | Mandatory (proto) | Native bidi | Internal microservices, low-latency |
There is no "best". Pick by use-case. Most companies have all three.
2. REST done well
GET /v1/users/42 → fetch one
GET /v1/users?role=admin → list, query params for filter
POST /v1/users → create
PUT /v1/users/42 → replace
PATCH /v1/users/42 → partial update
DELETE /v1/users/42 → delete
Status codes matter:
- 2xx success —
200,201 Created(withLocation: /users/43),204 No Content. - 3xx redirect — rare in APIs.
- 4xx client errors —
400(bad input),401(unauth),403(forbidden),404(not found),409(conflict),422(validation),429(rate limit). - 5xx server errors —
500,502,503(overload),504(timeout).
Don't return 200 OK with {"error": "..."}. That breaks every client retry policy on the planet.
3. Resource shape
- Plural noun for collections:
/users, not/user. - Sub-resources for relationships:
/users/42/orders. - Don't put verbs in URLs (
/getUser,/activateUser) — use HTTP verbs and resource state. - Reserved exception: actions that don't map to CRUD —
POST /users/42:resetPassword(Google-style colon-action).
4. GraphQL — the pitch
One endpoint (/graphql). Client describes exactly what it needs:
query {
user(id: "42") {
name
orders(last: 5) {
id
total
items { name price }
}
}
}
Pros:
- No over-fetching (mobile users save bandwidth).
- No under-fetching (fewer round-trips).
- Typed schema; clients can codegen.
- Single endpoint; easier ops.
Cons:
- N+1 query problem (must use DataLoader-style batching).
- Caching is harder (no URL → response mapping).
- Authorisation is per-field, not per-endpoint (more places to mess up).
- Costly arbitrary queries — depth/complexity limits required.
5. gRPC — when binary wins
Protobuf-defined service:
service UserService {
rpc GetUser(GetUserRequest) returns (User);
rpc StreamEvents(EventFilter) returns (stream Event);
rpc UploadFile(stream FileChunk) returns (UploadResult);
rpc Chat(stream Message) returns (stream Message);
}
Pros:
- Strongly typed, generated stubs in 10+ languages.
- Binary on the wire — 3–10× smaller than JSON.
- HTTP/2 multiplexing, head-of-line blocking gone.
- Native streaming (unary, server, client, bidi).
Cons:
- Not browser-native — needs grpc-web proxy.
- Binary = harder to debug with curl (use
grpcurl). - HTTP/2 issues with some old infrastructure.
Default for internal microservices in any language-polyglot org.
6. Versioning
Three schools:
- URL versioning —
/v1/users,/v2/users. Most popular, easiest to reason about, ugliest. - Header versioning —
Accept: application/vnd.myapi.v2+json. Cleaner URL, harder to debug. - No versioning, only deprecation — additive changes only; never break. Stripe pioneered; high discipline required.
For a startup → URL versioning. Migrate to Stripe-style additive once you have public partners.
7. Pagination
| Style | How | Pros | Cons |
|---|---|---|---|
| Offset/limit | ?page=2&size=20 | Simple, jump to page | Slow on deep pages; broken on insert |
| Cursor | ?cursor=abc&size=20 | Stable across inserts; fast | No jump-to-page; opaque cursor |
| Keyset (seek) | ?after_id=12345&size=20 | Fastest; index-friendly | Sortable column required |
For infinite scroll or sync APIs → cursor. For human-facing tables → offset. Don't allow page=100000 — set a cap.
8. Error responses
Consistent shape:
{
"error": {
"code": "USER_NOT_FOUND",
"message": "User 42 not found",
"details": [
{"field": "user_id", "issue": "no_such_record"}
],
"trace_id": "abc-123"
}
}
code— machine-readable enum; never change.message— human; safe to display.details— per-field issues for form validation.trace_id— for support tickets.
Document every code. Clients will switch on it.
9. Idempotency
A retried POST should not create two orders. Implement via:
- Client supplies
Idempotency-Key: \<uuid>header. - Server stores key → response for 24h.
- Same key + same body → return cached response.
- Same key + different body →
409 Conflict.
Stripe-style. Mandatory for any POST that has side-effects + money.
10. Rate limiting
Two strategies often combined:
- Per-API-key quota (
X-RateLimit-Remainingheader). - Global per-resource burst (token bucket).
Headers (standard):
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 23
X-RateLimit-Reset: 1735689600
Retry-After: 30
Return 429 Too Many Requests + Retry-After. Clients can back off intelligently.
11. Auth
Common patterns:
- API keys — simple, machine-to-machine. Rotate periodically.
- OAuth 2.0 — third-party consent (login with Google).
- JWT bearer — short-lived; signed claims; stateless verification.
- mTLS — service-to-service; cert-based; zero password leakage.
- OIDC + JWT — modern combo for SaaS B2B.
Don't roll your own auth. Use Auth0/Clerk/Cognito/Keycloak.
12. Documentation
Non-negotiable artefacts:
- OpenAPI spec (REST) — generated from code; renders in Swagger UI / ReDoc.
- GraphQL introspection — schema browsable in GraphiQL.
.protofiles — gRPC; generate docs viaprotoc-gen-doc.- Postman / Bruno collection — examples that actually run.
- Changelog — every breaking change, with a date.
API without docs = API that doesn't exist.
13. Reality check
A modern API stack:
- REST + OpenAPI for public.
- gRPC for internal high-traffic services.
- GraphQL for mobile/web aggregation layer (BFF pattern).
- A gateway in front (Kong, Envoy) for auth, rate limit, observability.
- Versioning at URL.
- Postman + Swagger for docs.
You don't need all three. Most teams should pick REST first and add the others when actual pain demands it.
Reading material
- Roy Fielding — Architectural Styles (REST origins)
- Google API Design Guide
- GraphQL Best Practices
- Stripe API Reference (style guide reference)
In-depth research material
Video reference
▶︎ REST vs GraphQL vs gRPC (System Design Mastery)
Pick a quiet 30 minutes during this session to actually watch it. Don't multitask.
LeetCode — Design Rate Limiter
- Link: https://leetcode.com/problems/design-rate-limiter/
- Difficulty: Medium
- Why this problem: Token bucket / sliding window — the canonical API hygiene primitive.
- Time-box: 30 minutes. Look up the editorial only after.
Post-session checklist
By the end of this session you should be able to:
- Pick REST, GraphQL, or gRPC for a given use-case with one-sentence justification.
- List the 4xx codes for: invalid input, auth missing, auth wrong, conflict, rate-limit.
- Design an idempotent POST endpoint with
Idempotency-Key. - Explain cursor vs offset pagination and when each fails.
- Write an error response with
code,message,details,trace_id. - Solve
design-rate-limiter— sliding-window or token-bucket implementation.
Generated from sessions_data.py + content_part*.py. To edit a video / leetcode / title, edit the data file and re-run write_sessions.py.