Day 10 — Concurrency Models — Threads, Asyncio, GIL, Actors
Every backend you build will block on IO or compute. Knowing *which* concurrency model to pick (and *why*) cuts latency by 10× and prevents the classes of bugs…
Concurrency is about structure (running multiple tasks); parallelism is about execution (running them simultaneously). Pick the model that matches your workload — IO-bound vs CPU-bound — and the bugs/perf both fall out.
🧠 Concept
Why it matters & the mental model.
1. The four-quadrant choice
2. Threads & the GIL
CPython has a Global Interpreter Lock — one bytecode at a time per process. So:
- IO-bound (sockets, files): threads work great, GIL is released during IO syscalls.
- CPU-bound (pure Python loops, NumPy fallback): threads do nothing for you; multiprocessing or native code wins.
- C-extension CPU work (NumPy, PyTorch, BLAS): the GIL is released inside the extension → threads do help. Python 3.13's experimental free-threaded build removes the GIL but it's not the default yet.
3. Asyncio — cooperative concurrency
A single event loop multiplexes thousands of coroutines. No threads, no locks (mostly), but one slow call blocks everything. Rules:
- Never call sync IO from async code. Use
asyncio.to_thread(blocking_fn)to offload. - Don't
time.sleep; useawait asyncio.sleep. - Wrap shared state with
asyncio.Lockif multiple coroutines mutate it. - Use
asyncio.TaskGroup(3.11+) for structured concurrency with proper exception propagation.
4. Multiprocessing — true parallelism
Separate Python processes, each with its own GIL → can saturate all cores on CPU work. Costs: process startup (~100 ms), IPC serialisation (pickle), no shared memory by default. Use multiprocessing.Pool, concurrent.futures.ProcessPoolExecutor, or libraries (joblib, ray, dask).
🛠 Deep Dive
Internals, code, architecture.
5. Executors — the easy button
concurrent.futures gives a uniform API:
ThreadPoolExecutorfor IO-bound.ProcessPoolExecutorfor CPU-bound.as_completed(futures)streams results as they arrive;mappreserves order. Setmax_workersdeliberately (≈2-4 × coresfor IO,coresfor CPU).
6. The actor model
Each actor owns its state and a mailbox; messages are processed sequentially per actor. No shared memory → no locks. Used by Erlang/Elixir (OTP), Akka (JVM), Orleans (.NET). Great fit for stateful agents, IOT, game servers. Failure isolation is built in (let it crash + supervisor restart).
7. The async pitfalls people learn the hard way
- Forgotten await → coroutine object never runs; nothing happens, no error.
- Mixing sync and async libs → silent blocking. Always check "is this lib async-native?".
- CPU work in event loop → tail latency explodes. Offload to a thread/process.
- Unbounded concurrency → fan out to 10k requests, crash the server. Use
asyncio.Semaphoreor a worker pool. - Cancellation → tasks left running after a timeout. Use
asyncio.TaskGroupor wrap withasyncio.timeout().
8. Memory model basics (not just Python)
- Happens-before: a write in thread A is visible to thread B only with synchronisation (lock, atomic, barrier).
- Volatile in Java /
std::atomicin C++ ≠ mutex; they give visibility, not atomic compound ops. - Data race vs race condition: data race = unsynchronised concurrent access to same memory (undefined behaviour); race condition = ordering bug (semantically wrong).
🚀 In Practice
Trade-offs, exercises, what to ship today.
9. Higher-level patterns
- Producer-consumer with bounded queue (backpressure).
- Pipeline of async stages (each stage a coroutine with input/output channels).
- Fan-out / fan-in for parallel sub-tasks + gather.
- Circuit breaker to stop hammering a failing downstream.
- Rate limiter (token bucket) for fairness and external API quotas.
10. Debugging concurrency
py-spy --pid <pid>for live flame graphs.asyncio.run(main(), debug=True)warns on slow callbacks.threading.settrace,faulthandlerfor deadlocks.- Always log task identity (
asyncio.current_task().get_name()) when chasing weird interleavings.
11. Distributed extension
Once you outgrow one machine, the same models scale:
- Threads → thread pool per service.
- Asyncio → many services on one box.
- Multiprocessing → workers / pods.
- Actors → Akka / Orleans cluster or Ray actors.
12. What to take away
"Why is this slow?" Strong answers identify IO-bound vs CPU-bound first, then suggest the matching model. Mentioning the GIL without that context is a yellow flag — it's a tool, not a curse.
Resources
- 🎥 David Beazley — Understanding the Python GIL
- 📖 Real Python — Asyncio Walkthrough
- 📖 Java Concurrency in Practice — chapter excerpts
- 📖 Akka / Erlang — Actor model intro (Joe Armstrong talk)
Practice Problem: Design Hit Counter (Medium)