concurrent.futures: ThreadPoolExecutor & ProcessPoolExecutor
concurrent.futures is the high-level API for running callables on a thread or process pool. ThreadPoolExecutor for I/O-bound work (the GIL is released during blocking I/O). ProcessPoolExecutor for CPU-bound work (one Python interpreter per process, true parallelism). Submit returns a Future; map streams results in input order.
What it is
concurrent.futures is the standard library's high-level API for running callables in parallel. Two implementations: ThreadPoolExecutor and ProcessPoolExecutor. Same Future-returning API, different execution models.
This is the API to reach for first whenever Python parallelism is needed. Lower-level options (threading.Thread, multiprocessing.Process) exist, but they require hand-building the pool, the future, the result channel, the error handling. concurrent.futures packages all that.
Threads vs processes
The choice maps to the bottleneck.
I/O bound: the work spends most of its time waiting on the network, the disk, or the database. The GIL is released during these blocking syscalls. Threads work, and many of them (50, 100, 500 in extreme cases) fit in memory without much cost. ThreadPoolExecutor.
CPU bound: the work spends most of its time running Python bytecode. The GIL serialises Python execution, so threads do not help. Processes provide separate interpreters, and therefore real parallelism on multiple cores. ProcessPoolExecutor.
The grey zone: CPU work in a C extension that releases the GIL. NumPy operations, OpenCV, hashlib, regex compilation. Threads work here. Check the library's documentation; "releases the GIL" is the magic phrase.
The three iteration patterns
map: input order preserved, blocks on each result in turn. Best when results are needed in the same order as inputs and processed as they arrive.
as_completed: yields each Future as it finishes. Best when input order does not matter and downstream work should start as soon as any result is ready.
wait: blocks until N futures done, or any done, or timeout. Best for hedged calls, first-to-finish patterns, and timed batches.
Pick based on what the downstream consumer needs. Using map when completion order is wanted, or as_completed when input order is wanted, leads to subtle ordering bugs.
Pool sizing
For threads: think about the I/O concurrency budget. If the downstream service handles 50 concurrent connections, do not exceed that. If the bottleneck is disk, more threads do not help. The default (min(32, cpu_count+4)) is rarely the right number for I/O-heavy work; tune to the workload.
For processes: usually equal to core count, sometimes core count - 1 to leave a core for the parent. Above that wastes memory for no gain. Below that leaves parallelism on the table.
What this does not give
Cancellation of running tasks. Once a worker has started, it runs to completion. The standard workaround is cooperative: pass a threading.Event or a cancel token, check it periodically, return early.
Cross-task communication. If task A's result feeds into task B, the chaining must be explicit. asyncio (or libraries like Dask, Ray) handle this better.
Backpressure. Submit accepts as many futures as the caller pushes at it. For large fan-out, batch the submissions or use a bounded executor wrapper.
Primitives by language
- ThreadPoolExecutor (worker threads, GIL-shared)
- ProcessPoolExecutor (worker processes, separate interpreters)
- Future (result, exception, done, cancel, add_done_callback)
- as_completed (yields futures as they finish)
- wait (blocks until N done or any done)
Implementation
Each request blocks on the network. The GIL releases while waiting, so 32 requests can be in flight on 32 threads. The throughput improvement over sequential is roughly the average request latency divided by the per-request CPU work, often 30-100x.
1 from concurrent.futures import ThreadPoolExecutor, as_completed
2 import requests
3
4 urls = ["https://api.example.com/items/{}".format(i) for i in range(100)]
5
6 def fetch(url):
7 r = requests.get(url, timeout=5)
8 r.raise_for_status()
9 return r.json()
10
11 with ThreadPoolExecutor(max_workers=32) as pool:
12 # Order-preserving: results in url order
13 results = list(pool.map(fetch, urls))
14
15 # Or completion order, with per-future error handling:
16 futures = {pool.submit(fetch, u): u for u in urls}
17 for fut in as_completed(futures):
18 url = futures[fut]
19 try:
20 data = fut.result()
21 except Exception as e:
22 print(f"{url} failed: {e}")Image processing in pure Python is CPU-bound. Threads do not help (one runs at a time under the GIL). Processes do: 4 worker processes on 4 cores give roughly 4x throughput. The cost is pickling: arguments and results serialise across the process boundary.
1 from concurrent.futures import ProcessPoolExecutor
2
3 def process_image(path: str) -> dict:
4 # CPU-heavy work: numpy ops, pure Python parsing, etc.
5 with open(path, "rb") as f:
6 data = f.read()
7 return analyse(data)
8
9 with ProcessPoolExecutor(max_workers=4) as pool:
10 for path, summary in zip(paths, pool.map(process_image, paths)):
11 save(path, summary)
12
13 # Note: arguments and return values must be picklable.
14 # Local closures, lambdas, file handles will fail.submit returns a Future immediately. The Future supports done(), add_done_callback, cancel (only succeeds if the task has not started), and result(). Exceptions raised in the worker are re-raised on result().
1 from concurrent.futures import ThreadPoolExecutor
2
3 with ThreadPoolExecutor(max_workers=4) as pool:
4 fut = pool.submit(slow_operation, arg)
5
6 fut.add_done_callback(lambda f: print("done:", f.result()))
7
8 # Try to cancel; only works if task has not started
9 if not fut.cancel():
10 try:
11 result = fut.result(timeout=10) # blocks up to 10s
12 except TimeoutError:
13 print("still running after 10s")
14 except Exception as e:
15 print(f"task raised: {e}")wait offers finer control than map or as_completed. Used for hedged calls (return as soon as one succeeds), for partial-result patterns (return what is available so far), and for timed waits with custom logic.
1 from concurrent.futures import ThreadPoolExecutor, wait, FIRST_COMPLETED
2
3 with ThreadPoolExecutor(max_workers=3) as pool:
4 futures = [pool.submit(read_replica, r) for r in replicas]
5
6 done, pending = wait(futures, timeout=2.0, return_when=FIRST_COMPLETED)
7
8 if done:
9 winner = next(iter(done))
10 for p in pending: p.cancel() # may not stop running ones
11 return winner.result()
12 raise TimeoutError("all replicas slow")Key points
- •ThreadPoolExecutor: I/O-bound (HTTP, DB, file). The GIL releases during blocking syscalls so multiple threads make progress.
- •ProcessPoolExecutor: CPU-bound (parsing, image processing, math). True parallelism, but pickling cost on every call.
- •executor.map preserves input order. as_completed yields in completion order. Pick by what the downstream consumer needs.
- •Always use a context manager (with). Forgetting to shutdown leaks worker threads or processes.
- •Default max_workers: min(32, cpu_count+4) for ThreadPool, cpu_count for ProcessPool. Tune to the workload.
Follow-up questions
▸ThreadPoolExecutor or ProcessPoolExecutor: how to choose?
▸What is the cost of ProcessPoolExecutor?
▸Why not just use threading.Thread directly?
▸Can a Future be cancelled mid-execution?
Gotchas
- !Forgetting `with` context manager leaks workers (especially processes, which keep file handles open)
- !ProcessPoolExecutor on Windows uses spawn, not fork; everything in __main__ guard
- !executor.map raises only on iteration; exceptions in early items are deferred
- !Default max_workers is often wrong for the workload; size based on bottleneck (cores for CPU, latency * QPS for I/O)
- !Cancelled futures still complete pending work in the queue if other threads pick them up