Search Tech Journey

Find topics, journeys and posts

back to blog
engineeringintermediate 12m2026-06-09

Concurrency Models — Threads, Asyncio, GIL, Actors

Session 16 of the 48-session learning series.

Date: Sun, 2026-06-21 · Time: 14:30–16:30 IST · Track: 🧱 OOP & Languages (OOP) · Parent 28-day topic: Day 10 · Est. read: 2 h

Why this session matters

This is Session 16 of 48 in the OOP & Languages track. It builds on the rhythm of one focused topic, paced so you have time to actually absorb it rather than rush.

Agenda

  • The GIL — what it actually locks, why it exists, when it bites
  • Threading — when it works in Python (I/O-bound), classic primitives
  • Multiprocessing — true parallelism, IPC cost, pickle pitfalls
  • Asyncio — single-threaded cooperative scheduling, event loop, await
  • Actors and CSP — Erlang/Elixir, Go channels, Akka

Pre-read (skim before the session)

Deep dive

1. What problem are we even solving?

Concurrency ≠ parallelism. Concurrency = structuring a program to handle multiple things; parallelism = actually doing multiple things at once on multiple cores.

A web server handling 10K connections needs concurrency — even on one core, it must juggle. A matrix multiply needs parallelism — divide rows across cores.

2. The GIL — what it locks

CPython's Global Interpreter Lock is a mutex that protects access to CPython internals (reference counts, the import system, etc.). Only one thread can execute Python bytecode at a time.

  • I/O calls release the GIL (read, recv, sleep, requests.get).
  • C extensions can release the GIL (NumPy, PyTorch's CPU ops, lxml).
  • Pure-Python CPU work cannot run on multiple cores concurrently.

So:

  • ✅ Threads work great for I/O-bound work (HTTP, DB, file).
  • ❌ Threads do not speed up CPU-bound Python loops.

Python 3.13 ships an experimental no-GIL build (PEP 703). Most extensions still need to be GIL-free aware; expect mainstream by 3.15–3.16.

3. Threading — the boring but useful kind

import threading, queue, requests

def fetch(url, out):
    out.put((url, requests.get(url).status_code))

q = queue.Queue()
threads = [threading.Thread(target=fetch, args=(u, q)) for u in urls]
for t in threads: t.start()
for t in threads: t.join()

Common primitives:

PrimitiveUse case
LockMutual exclusion
RLockRe-entrant lock (same thread can re-acquire)
Semaphore(n)Limit n concurrent holders (connection pool)
EventOne-shot signal (e.g., "config loaded")
ConditionWait until some predicate holds; pair with a Lock
QueueThread-safe FIFO; the right primitive for most producer/consumer code

Rule: prefer queue.Queue over manual locks. Most thread bugs are bad locking.

4. Multiprocessing — real parallelism on CPython

from multiprocessing import Pool

def cpu_work(x): return sum(i*i for i in range(x))

with Pool(8) as pool:
    results = pool.map(cpu_work, range(100))
  • Each process has its own interpreter and GIL → real parallelism.
  • IPC cost: arguments and return values are pickled. Big numpy arrays = pickle hell. Use shared memory (multiprocessing.shared_memory or numpy.memmap) for those.
  • Spawn vs fork: on Linux, fork is fast but copies file handles. On macOS/Windows it's spawn (slow, but clean). Always test the start method you'll run in prod.

5. Asyncio — single-threaded cooperative

The event loop runs one coroutine at a time. When a coroutine awaits a future, the loop runs other ready coroutines.

import asyncio, aiohttp

async def fetch(session, url):
    async with session.get(url) as resp:
        return await resp.text()

async def main():
    async with aiohttp.ClientSession() as session:
        results = await asyncio.gather(*[fetch(session, u) for u in urls])

asyncio.run(main())

10K concurrent HTTP requests on one thread, ~200 MB RAM. Same thing with threads = OOM.

6. Async pitfalls

  • Don't call blocking code inside a coroutine — it stalls the loop. Use asyncio.to_thread() for the rare requests.get you can't replace.
  • CPU-bound work in asyncio still serialises. Hand it to loop.run_in_executor() with a ProcessPoolExecutor.
  • Cancellation — a Task can be cancelled; you must handle CancelledError and clean up.
  • Async libs everywhere — mixing sync and async libs hurts. Pick a stack (FastAPI + httpx + asyncpg, or Sanic + aiohttp + aiopg) and stick with it.

7. Picking a model

WorkloadPick
Many concurrent I/O (10K+ requests)asyncio
Moderate I/O (100s of requests), legacy sync libsthreading + ThreadPoolExecutor
CPU-bound, single machinemultiprocessing (or Cython/Numba/C extension)
CPU-bound, multi-machineSpark, Dask, Ray
GUI / event-driven UIevent loop (Tkinter, Qt)
Real-time, low-latency mixed workGo, Rust, Erlang/Elixir

8. Actors and CSP — alternative models

Actors (Erlang, Akka): isolated processes, mailbox, message passing. No shared state → no data races. Crashes are local; supervision trees restart actors.

CSP (Go channels): independent goroutines communicate by passing values on channels. "Don't communicate by sharing memory; share memory by communicating."

ch := make(chan int)
go func() { ch <- compute() }()
result := <-ch

Both models eliminate most race conditions by construction. Python asyncio.Queue between tasks is a CSP-flavoured pattern; it scales further than locks.

9. Common bug list (you will hit these)

  1. Double-checked locking without volatile — classic singleton bug; in Python you can use a module-level singleton instead.
  2. Forgetting to join — daemon thread keeps running, holds resources.
  3. Deadlock from inconsistent lock order — always acquire locks in the same global order.
  4. Pickling unpicklable objects in multiprocessing — DB connections, file handles. Pass paths/config, open in child.
  5. await inside a sync function — silently returns a coroutine object. Linter (ruff RUF006, mypy) catches this.
  6. Sharing a requests.Session across threads — it's thread-safe-ish but not safe; use one per thread or use httpx.

10. Production heuristics

  • Connection pool sized to worker count — DB pool of 10 + 50 workers = bottleneck. Pool size ≥ workers, or use async DB driver.
  • Backpressure — bounded queues, semaphores, drop or shed when full. Unbounded queues turn slowdowns into OOM.
  • Timeouts on every external call. No exceptions.
  • Profile before optimisingpy-spy, cProfile for sync; aiomonitor for async.

Reading material

In-depth research material

Video reference

▶︎ Raymond Hettinger — Concurrency from the Ground Up: A New Model

Pick a quiet 30 minutes during this session to actually watch it. Don't multitask.

LeetCode — Design Bounded Blocking Queue

Post-session checklist

By the end of this session you should be able to:

  • Explain exactly what the GIL locks and when it releases.
  • Choose between asyncio, threading, multiprocessing for a given workload.
  • Implement a bounded producer/consumer with a queue.
  • List 3 common deadlock causes and how to avoid each.
  • Describe the actor model in 3 sentences.
  • Solve design-bounded-blocking-queue — two semaphores + a lock; the classic channel pattern.

Generated from sessions_data.py + content_part*.py. To edit a video / leetcode / title, edit the data file and re-run write_sessions.py.