Concurrency Models — Threads, Asyncio, GIL, Actors
Session 16 of the 48-session learning series.
Date: Sun, 2026-06-21 · Time: 14:30–16:30 IST · Track: 🧱 OOP & Languages (OOP) · Parent 28-day topic: Day 10 · Est. read: 2 h
Why this session matters
This is Session 16 of 48 in the OOP & Languages track. It builds on the rhythm of one focused topic, paced so you have time to actually absorb it rather than rush.
Agenda
- The GIL — what it actually locks, why it exists, when it bites
- Threading — when it works in Python (I/O-bound), classic primitives
- Multiprocessing — true parallelism, IPC cost, pickle pitfalls
- Asyncio — single-threaded cooperative scheduling, event loop, await
- Actors and CSP — Erlang/Elixir, Go channels, Akka
Pre-read (skim before the session)
- Python GIL — Larry Hastings PyCon talk
- Real Python — Async IO in Python
- PEP 3156 — Asynchronous IO Support
- Go Concurrency Patterns
Deep dive
1. What problem are we even solving?
Concurrency ≠ parallelism. Concurrency = structuring a program to handle multiple things; parallelism = actually doing multiple things at once on multiple cores.
A web server handling 10K connections needs concurrency — even on one core, it must juggle. A matrix multiply needs parallelism — divide rows across cores.
2. The GIL — what it locks
CPython's Global Interpreter Lock is a mutex that protects access to CPython internals (reference counts, the import system, etc.). Only one thread can execute Python bytecode at a time.
- I/O calls release the GIL (read, recv, sleep, requests.get).
- C extensions can release the GIL (NumPy, PyTorch's CPU ops, lxml).
- Pure-Python CPU work cannot run on multiple cores concurrently.
So:
- ✅ Threads work great for I/O-bound work (HTTP, DB, file).
- ❌ Threads do not speed up CPU-bound Python loops.
Python 3.13 ships an experimental no-GIL build (PEP 703). Most extensions still need to be GIL-free aware; expect mainstream by 3.15–3.16.
3. Threading — the boring but useful kind
import threading, queue, requests
def fetch(url, out):
out.put((url, requests.get(url).status_code))
q = queue.Queue()
threads = [threading.Thread(target=fetch, args=(u, q)) for u in urls]
for t in threads: t.start()
for t in threads: t.join()
Common primitives:
| Primitive | Use case |
|---|---|
Lock | Mutual exclusion |
RLock | Re-entrant lock (same thread can re-acquire) |
Semaphore(n) | Limit n concurrent holders (connection pool) |
Event | One-shot signal (e.g., "config loaded") |
Condition | Wait until some predicate holds; pair with a Lock |
Queue | Thread-safe FIFO; the right primitive for most producer/consumer code |
Rule: prefer queue.Queue over manual locks. Most thread bugs are bad locking.
4. Multiprocessing — real parallelism on CPython
from multiprocessing import Pool
def cpu_work(x): return sum(i*i for i in range(x))
with Pool(8) as pool:
results = pool.map(cpu_work, range(100))
- Each process has its own interpreter and GIL → real parallelism.
- IPC cost: arguments and return values are pickled. Big numpy arrays = pickle hell. Use shared memory (
multiprocessing.shared_memoryornumpy.memmap) for those. - Spawn vs fork: on Linux, fork is fast but copies file handles. On macOS/Windows it's spawn (slow, but clean). Always test the start method you'll run in prod.
5. Asyncio — single-threaded cooperative
The event loop runs one coroutine at a time. When a coroutine awaits a future, the loop runs other ready coroutines.
import asyncio, aiohttp
async def fetch(session, url):
async with session.get(url) as resp:
return await resp.text()
async def main():
async with aiohttp.ClientSession() as session:
results = await asyncio.gather(*[fetch(session, u) for u in urls])
asyncio.run(main())
10K concurrent HTTP requests on one thread, ~200 MB RAM. Same thing with threads = OOM.
6. Async pitfalls
- Don't call blocking code inside a coroutine — it stalls the loop. Use
asyncio.to_thread()for the rarerequests.getyou can't replace. - CPU-bound work in asyncio still serialises. Hand it to
loop.run_in_executor()with a ProcessPoolExecutor. - Cancellation — a
Taskcan be cancelled; you must handleCancelledErrorand clean up. - Async libs everywhere — mixing sync and async libs hurts. Pick a stack (FastAPI + httpx + asyncpg, or Sanic + aiohttp + aiopg) and stick with it.
7. Picking a model
| Workload | Pick |
|---|---|
| Many concurrent I/O (10K+ requests) | asyncio |
| Moderate I/O (100s of requests), legacy sync libs | threading + ThreadPoolExecutor |
| CPU-bound, single machine | multiprocessing (or Cython/Numba/C extension) |
| CPU-bound, multi-machine | Spark, Dask, Ray |
| GUI / event-driven UI | event loop (Tkinter, Qt) |
| Real-time, low-latency mixed work | Go, Rust, Erlang/Elixir |
8. Actors and CSP — alternative models
Actors (Erlang, Akka): isolated processes, mailbox, message passing. No shared state → no data races. Crashes are local; supervision trees restart actors.
CSP (Go channels): independent goroutines communicate by passing values on channels. "Don't communicate by sharing memory; share memory by communicating."
ch := make(chan int)
go func() { ch <- compute() }()
result := <-ch
Both models eliminate most race conditions by construction. Python asyncio.Queue between tasks is a CSP-flavoured pattern; it scales further than locks.
9. Common bug list (you will hit these)
- Double-checked locking without volatile — classic singleton bug; in Python you can use a module-level singleton instead.
- Forgetting to join — daemon thread keeps running, holds resources.
- Deadlock from inconsistent lock order — always acquire locks in the same global order.
- Pickling unpicklable objects in multiprocessing — DB connections, file handles. Pass paths/config, open in child.
awaitinside a sync function — silently returns a coroutine object. Linter (ruff RUF006,mypy) catches this.- Sharing a
requests.Sessionacross threads — it's thread-safe-ish but not safe; use one per thread or usehttpx.
10. Production heuristics
- Connection pool sized to worker count — DB pool of 10 + 50 workers = bottleneck. Pool size ≥ workers, or use async DB driver.
- Backpressure — bounded queues, semaphores, drop or shed when full. Unbounded queues turn slowdowns into OOM.
- Timeouts on every external call. No exceptions.
- Profile before optimising —
py-spy,cProfilefor sync;aiomonitorfor async.
Reading material
- Python concurrent.futures docs
- Python asyncio docs
- Java Concurrency in Practice (Goetz)
- The Little Book of Semaphores (Downey)
In-depth research material
- PEP 703 — Making the GIL Optional
- Hettinger — Concurrency from the Ground Up
- Go Concurrency Patterns (Pike)
- Akka actor model docs
Video reference
▶︎ Raymond Hettinger — Concurrency from the Ground Up: A New Model
Pick a quiet 30 minutes during this session to actually watch it. Don't multitask.
LeetCode — Design Bounded Blocking Queue
- Link: https://leetcode.com/problems/design-bounded-blocking-queue/
- Difficulty: Medium
- Why this problem: Two semaphores (empty, full) + a lock. Maps directly to a thread-safe channel.
- Time-box: 30 minutes. Look up the editorial only after.
Post-session checklist
By the end of this session you should be able to:
- Explain exactly what the GIL locks and when it releases.
- Choose between asyncio, threading, multiprocessing for a given workload.
- Implement a bounded producer/consumer with a queue.
- List 3 common deadlock causes and how to avoid each.
- Describe the actor model in 3 sentences.
- Solve
design-bounded-blocking-queue— two semaphores + a lock; the classic channel pattern.
Generated from sessions_data.py + content_part*.py. To edit a video / leetcode / title, edit the data file and re-run write_sessions.py.