Search Tech Journey

Find topics, journeys and posts

back to blog
engineeringadvanced 12m2026-06-08

Day 10 — Concurrency Models — Threads, Asyncio, GIL, Actors

Every backend you build will block on IO or compute. Knowing *which* concurrency model to pick (and *why*) cuts latency by 10× and prevents the classes of bugs…

Concurrency is about structure (running multiple tasks); parallelism is about execution (running them simultaneously). Pick the model that matches your workload — IO-bound vs CPU-bound — and the bugs/perf both fall out.

🧠 Concept

Why it matters & the mental model.

1. The four-quadrant choice

2. Threads & the GIL

CPython has a Global Interpreter Lock — one bytecode at a time per process. So:

  • IO-bound (sockets, files): threads work great, GIL is released during IO syscalls.
  • CPU-bound (pure Python loops, NumPy fallback): threads do nothing for you; multiprocessing or native code wins.
  • C-extension CPU work (NumPy, PyTorch, BLAS): the GIL is released inside the extension → threads do help. Python 3.13's experimental free-threaded build removes the GIL but it's not the default yet.

3. Asyncio — cooperative concurrency

A single event loop multiplexes thousands of coroutines. No threads, no locks (mostly), but one slow call blocks everything. Rules:

  • Never call sync IO from async code. Use asyncio.to_thread(blocking_fn) to offload.
  • Don't time.sleep; use await asyncio.sleep.
  • Wrap shared state with asyncio.Lock if multiple coroutines mutate it.
  • Use asyncio.TaskGroup (3.11+) for structured concurrency with proper exception propagation.

4. Multiprocessing — true parallelism

Separate Python processes, each with its own GIL → can saturate all cores on CPU work. Costs: process startup (~100 ms), IPC serialisation (pickle), no shared memory by default. Use multiprocessing.Pool, concurrent.futures.ProcessPoolExecutor, or libraries (joblib, ray, dask).

🛠 Deep Dive

Internals, code, architecture.

5. Executors — the easy button

concurrent.futures gives a uniform API:

  • ThreadPoolExecutor for IO-bound.
  • ProcessPoolExecutor for CPU-bound. as_completed(futures) streams results as they arrive; map preserves order. Set max_workers deliberately (≈ 2-4 × cores for IO, cores for CPU).

6. The actor model

Each actor owns its state and a mailbox; messages are processed sequentially per actor. No shared memory → no locks. Used by Erlang/Elixir (OTP), Akka (JVM), Orleans (.NET). Great fit for stateful agents, IOT, game servers. Failure isolation is built in (let it crash + supervisor restart).

7. The async pitfalls people learn the hard way

  • Forgotten await → coroutine object never runs; nothing happens, no error.
  • Mixing sync and async libs → silent blocking. Always check "is this lib async-native?".
  • CPU work in event loop → tail latency explodes. Offload to a thread/process.
  • Unbounded concurrency → fan out to 10k requests, crash the server. Use asyncio.Semaphore or a worker pool.
  • Cancellation → tasks left running after a timeout. Use asyncio.TaskGroup or wrap with asyncio.timeout().

8. Memory model basics (not just Python)

  • Happens-before: a write in thread A is visible to thread B only with synchronisation (lock, atomic, barrier).
  • Volatile in Java / std::atomic in C++ ≠ mutex; they give visibility, not atomic compound ops.
  • Data race vs race condition: data race = unsynchronised concurrent access to same memory (undefined behaviour); race condition = ordering bug (semantically wrong).

🚀 In Practice

Trade-offs, exercises, what to ship today.

9. Higher-level patterns

  • Producer-consumer with bounded queue (backpressure).
  • Pipeline of async stages (each stage a coroutine with input/output channels).
  • Fan-out / fan-in for parallel sub-tasks + gather.
  • Circuit breaker to stop hammering a failing downstream.
  • Rate limiter (token bucket) for fairness and external API quotas.

10. Debugging concurrency

  • py-spy --pid <pid> for live flame graphs.
  • asyncio.run(main(), debug=True) warns on slow callbacks.
  • threading.settrace, faulthandler for deadlocks.
  • Always log task identity (asyncio.current_task().get_name()) when chasing weird interleavings.

11. Distributed extension

Once you outgrow one machine, the same models scale:

  • Threads → thread pool per service.
  • Asyncio → many services on one box.
  • Multiprocessing → workers / pods.
  • Actors → Akka / Orleans cluster or Ray actors.

12. What to take away

"Why is this slow?" Strong answers identify IO-bound vs CPU-bound first, then suggest the matching model. Mentioning the GIL without that context is a yellow flag — it's a tool, not a curse.

Key points

    Resources

    Practice Problem: Design Hit Counter (Medium)