Python ConcurrencyTopic 6 of 11

LanguagePythonIntermediateAsked Often

concurrent.futures: ThreadPoolExecutor & ProcessPoolExecutor

In one line

concurrent.futures is the high-level API for running callables on a thread or process pool. ThreadPoolExecutor for I/O-bound work (the GIL is released during blocking I/O). ProcessPoolExecutor for CPU-bound work (one Python interpreter per process, true parallelism). Submit returns a Future; map streams results in input order.

What it is

concurrent.futures is the standard library's high-level API for running callables in parallel. Two implementations: ThreadPoolExecutor and ProcessPoolExecutor. Same Future-returning API, different execution models.

This is the API to reach for first whenever Python parallelism is needed. Lower-level options (threading.Thread, multiprocessing.Process) exist, but they require hand-building the pool, the future, the result channel, the error handling. concurrent.futures packages all that.

Threads vs processes

The choice maps to the bottleneck.

I/O bound: the work spends most of its time waiting on the network, the disk, or the database. The GIL is released during these blocking syscalls. Threads work, and many of them (50, 100, 500 in extreme cases) fit in memory without much cost. ThreadPoolExecutor.

CPU bound: the work spends most of its time running Python bytecode. The GIL serialises Python execution, so threads do not help. Processes provide separate interpreters, and therefore real parallelism on multiple cores. ProcessPoolExecutor.

The grey zone: CPU work in a C extension that releases the GIL. NumPy operations, OpenCV, hashlib, regex compilation. Threads work here. Check the library's documentation; "releases the GIL" is the magic phrase.

The three iteration patterns

map: input order preserved, blocks on each result in turn. Best when results are needed in the same order as inputs and processed as they arrive.

as_completed: yields each Future as it finishes. Best when input order does not matter and downstream work should start as soon as any result is ready.

wait: blocks until N futures done, or any done, or timeout. Best for hedged calls, first-to-finish patterns, and timed batches.

Pick based on what the downstream consumer needs. Using map when completion order is wanted, or as_completed when input order is wanted, leads to subtle ordering bugs.

Pool sizing

For threads: think about the I/O concurrency budget. If the downstream service handles 50 concurrent connections, do not exceed that. If the bottleneck is disk, more threads do not help. The default (min(32, cpu_count+4)) is rarely the right number for I/O-heavy work; tune to the workload.

For processes: usually equal to core count, sometimes core count - 1 to leave a core for the parent. Above that wastes memory for no gain. Below that leaves parallelism on the table.

What this does not give

Cancellation of running tasks. Once a worker has started, it runs to completion. The standard workaround is cooperative: pass a threading.Event or a cancel token, check it periodically, return early.

Cross-task communication. If task A's result feeds into task B, the chaining must be explicit. asyncio (or libraries like Dask, Ray) handle this better.

Backpressure. Submit accepts as many futures as the caller pushes at it. For large fan-out, batch the submissions or use a bounded executor wrapper.

Primitives by language

ThreadPoolExecutor (worker threads, GIL-shared)
ProcessPoolExecutor (worker processes, separate interpreters)
Future (result, exception, done, cancel, add_done_callback)
as_completed (yields futures as they finish)
wait (blocks until N done or any done)

Implementation

Fan-out HTTP calls with ThreadPoolExecutor

Each request blocks on the network. The GIL releases while waiting, so 32 requests can be in flight on 32 threads. The throughput improvement over sequential is roughly the average request latency divided by the per-request CPU work, often 30-100x.

 1  from concurrent.futures import ThreadPoolExecutor, as_completed
 2  import requests
 3  
 4  urls = ["https://api.example.com/items/{}".format(i) for i in range(100)]
 5  
 6  def fetch(url):
 7      r = requests.get(url, timeout=5)
 8      r.raise_for_status()
 9      return r.json()
10  
11  with ThreadPoolExecutor(max_workers=32) as pool:
12      # Order-preserving: results in url order
13      results = list(pool.map(fetch, urls))
14  
15      # Or completion order, with per-future error handling:
16      futures = {pool.submit(fetch, u): u for u in urls}
17      for fut in as_completed(futures):
18          url = futures[fut]
19          try:
20              data = fut.result()
21          except Exception as e:
22              print(f"{url} failed: {e}")

CPU-bound work with ProcessPoolExecutor

Image processing in pure Python is CPU-bound. Threads do not help (one runs at a time under the GIL). Processes do: 4 worker processes on 4 cores give roughly 4x throughput. The cost is pickling: arguments and results serialise across the process boundary.

 1  from concurrent.futures import ProcessPoolExecutor
 2  
 3  def process_image(path: str) -> dict:
 4      # CPU-heavy work: numpy ops, pure Python parsing, etc.
 5      with open(path, "rb") as f:
 6          data = f.read()
 7      return analyse(data)
 8  
 9  with ProcessPoolExecutor(max_workers=4) as pool:
10      for path, summary in zip(paths, pool.map(process_image, paths)):
11          save(path, summary)
12  
13  # Note: arguments and return values must be picklable.
14  # Local closures, lambdas, file handles will fail.

Future API: cancel, callback, exception

submit returns a Future immediately. The Future supports done(), add_done_callback, cancel (only succeeds if the task has not started), and result(). Exceptions raised in the worker are re-raised on result().

 1  from concurrent.futures import ThreadPoolExecutor
 2  
 3  with ThreadPoolExecutor(max_workers=4) as pool:
 4      fut = pool.submit(slow_operation, arg)
 5  
 6      fut.add_done_callback(lambda f: print("done:", f.result()))
 7  
 8      # Try to cancel; only works if task has not started
 9      if not fut.cancel():
10          try:
11              result = fut.result(timeout=10)            # blocks up to 10s
12          except TimeoutError:
13              print("still running after 10s")
14          except Exception as e:
15              print(f"task raised: {e}")

wait: first done or all done

wait offers finer control than map or as_completed. Used for hedged calls (return as soon as one succeeds), for partial-result patterns (return what is available so far), and for timed waits with custom logic.

 1  from concurrent.futures import ThreadPoolExecutor, wait, FIRST_COMPLETED
 2  
 3  with ThreadPoolExecutor(max_workers=3) as pool:
 4      futures = [pool.submit(read_replica, r) for r in replicas]
 5  
 6      done, pending = wait(futures, timeout=2.0, return_when=FIRST_COMPLETED)
 7  
 8      if done:
 9          winner = next(iter(done))
10          for p in pending: p.cancel()                   # may not stop running ones
11          return winner.result()
12      raise TimeoutError("all replicas slow")

Key points

•ThreadPoolExecutor: I/O-bound (HTTP, DB, file). The GIL releases during blocking syscalls so multiple threads make progress.
•ProcessPoolExecutor: CPU-bound (parsing, image processing, math). True parallelism, but pickling cost on every call.
•executor.map preserves input order. as_completed yields in completion order. Pick by what the downstream consumer needs.
•Always use a context manager (with). Forgetting to shutdown leaks worker threads or processes.
•Default max_workers: min(32, cpu_count+4) for ThreadPool, cpu_count for ProcessPool. Tune to the workload.

Follow-up questions

▸ThreadPoolExecutor or ProcessPoolExecutor: how to choose?

When work is dominated by waiting (HTTP, DB queries, disk I/O), use threads. The GIL releases during blocking I/O syscalls, so threads make real concurrent progress. When work is dominated by Python-level CPU (parsing, math, image processing without numpy releasing the GIL), use processes. For CPU work in a C extension that releases the GIL (numpy, opencv), threads work too.

▸What is the cost of ProcessPoolExecutor?

Process startup (fork or spawn), and pickling every argument and every return value. For tiny workloads, the overhead exceeds the speedup. Rule of thumb: each task should do at least 10ms of work to justify process overhead, more on Windows where spawn is slower than fork.

▸Why not just use threading.Thread directly?

concurrent.futures provides the pool (bounded concurrency), the Future abstraction (result, exception, callback), error handling, and timeouts in one API. Plain threading.Thread is one thread per call, no result mechanism, no error propagation back to the caller. concurrent.futures is what's wanted 95% of the time.

▸Can a Future be cancelled mid-execution?

No. cancel() only succeeds if the task has not started. Once running, the worker thread or process executes to completion. Python has no thread cancellation primitive. The standard workaround is cooperative cancellation: pass a threading.Event to the worker, check it periodically, return early if set.

Gotchas

!Forgetting `with` context manager leaks workers (especially processes, which keep file handles open)
!ProcessPoolExecutor on Windows uses spawn, not fork; everything in __main__ guard
!executor.map raises only on iteration; exceptions in early items are deferred
!Default max_workers is often wrong for the workload; size based on bottleneck (cores for CPU, latency * QPS for I/O)
!Cancelled futures still complete pending work in the queue if other threads pick them up

concurrent.futures: ThreadPoolExecutor & ProcessPoolExecutor

In one line

What it is

Threads vs processes

The choice maps to the bottleneck.

The three iteration patterns

map: input order preserved, blocks on each result in turn. Best when results are needed in the same order as inputs and processed as they arrive.

as_completed: yields each Future as it finishes. Best when input order does not matter and downstream work should start as soon as any result is ready.

wait: blocks until N futures done, or any done, or timeout. Best for hedged calls, first-to-finish patterns, and timed batches.

Pick based on what the downstream consumer needs. Using map when completion order is wanted, or as_completed when input order is wanted, leads to subtle ordering bugs.

Pool sizing

For processes: usually equal to core count, sometimes core count - 1 to leave a core for the parent. Above that wastes memory for no gain. Below that leaves parallelism on the table.

What this does not give

Cancellation of running tasks. Once a worker has started, it runs to completion. The standard workaround is cooperative: pass a threading.Event or a cancel token, check it periodically, return early.

Cross-task communication. If task A's result feeds into task B, the chaining must be explicit. asyncio (or libraries like Dask, Ray) handle this better.

Backpressure. Submit accepts as many futures as the caller pushes at it. For large fan-out, batch the submissions or use a bounded executor wrapper.

Primitives by language

ThreadPoolExecutor (worker threads, GIL-shared)
ProcessPoolExecutor (worker processes, separate interpreters)
Future (result, exception, done, cancel, add_done_callback)
as_completed (yields futures as they finish)
wait (blocks until N done or any done)

Implementation

Fan-out HTTP calls with ThreadPoolExecutor

 1  from concurrent.futures import ThreadPoolExecutor, as_completed
 2  import requests
 3  
 4  urls = ["https://api.example.com/items/{}".format(i) for i in range(100)]
 5  
 6  def fetch(url):
 7      r = requests.get(url, timeout=5)
 8      r.raise_for_status()
 9      return r.json()
10  
11  with ThreadPoolExecutor(max_workers=32) as pool:
12      # Order-preserving: results in url order
13      results = list(pool.map(fetch, urls))
14  
15      # Or completion order, with per-future error handling:
16      futures = {pool.submit(fetch, u): u for u in urls}
17      for fut in as_completed(futures):
18          url = futures[fut]
19          try:
20              data = fut.result()
21          except Exception as e:
22              print(f"{url} failed: {e}")

CPU-bound work with ProcessPoolExecutor

 1  from concurrent.futures import ProcessPoolExecutor
 2  
 3  def process_image(path: str) -> dict:
 4      # CPU-heavy work: numpy ops, pure Python parsing, etc.
 5      with open(path, "rb") as f:
 6          data = f.read()
 7      return analyse(data)
 8  
 9  with ProcessPoolExecutor(max_workers=4) as pool:
10      for path, summary in zip(paths, pool.map(process_image, paths)):
11          save(path, summary)
12  
13  # Note: arguments and return values must be picklable.
14  # Local closures, lambdas, file handles will fail.

Future API: cancel, callback, exception

 1  from concurrent.futures import ThreadPoolExecutor
 2  
 3  with ThreadPoolExecutor(max_workers=4) as pool:
 4      fut = pool.submit(slow_operation, arg)
 5  
 6      fut.add_done_callback(lambda f: print("done:", f.result()))
 7  
 8      # Try to cancel; only works if task has not started
 9      if not fut.cancel():
10          try:
11              result = fut.result(timeout=10)            # blocks up to 10s
12          except TimeoutError:
13              print("still running after 10s")
14          except Exception as e:
15              print(f"task raised: {e}")

wait: first done or all done

 1  from concurrent.futures import ThreadPoolExecutor, wait, FIRST_COMPLETED
 2  
 3  with ThreadPoolExecutor(max_workers=3) as pool:
 4      futures = [pool.submit(read_replica, r) for r in replicas]
 5  
 6      done, pending = wait(futures, timeout=2.0, return_when=FIRST_COMPLETED)
 7  
 8      if done:
 9          winner = next(iter(done))
10          for p in pending: p.cancel()                   # may not stop running ones
11          return winner.result()
12      raise TimeoutError("all replicas slow")

Key points

•ThreadPoolExecutor: I/O-bound (HTTP, DB, file). The GIL releases during blocking syscalls so multiple threads make progress.
•ProcessPoolExecutor: CPU-bound (parsing, image processing, math). True parallelism, but pickling cost on every call.
•executor.map preserves input order. as_completed yields in completion order. Pick by what the downstream consumer needs.
•Always use a context manager (with). Forgetting to shutdown leaks worker threads or processes.
•Default max_workers: min(32, cpu_count+4) for ThreadPool, cpu_count for ProcessPool. Tune to the workload.

Follow-up questions

▸ThreadPoolExecutor or ProcessPoolExecutor: how to choose?

▸What is the cost of ProcessPoolExecutor?

▸Why not just use threading.Thread directly?

▸Can a Future be cancelled mid-execution?

Gotchas

!Forgetting `with` context manager leaks workers (especially processes, which keep file handles open)
!ProcessPoolExecutor on Windows uses spawn, not fork; everything in __main__ guard
!executor.map raises only on iteration; exceptions in early items are deferred
!Default max_workers is often wrong for the workload; size based on bottleneck (cores for CPU, latency * QPS for I/O)
!Cancelled futures still complete pending work in the queue if other threads pick them up

concurrent.futures: ThreadPoolExecutor & ProcessPoolExecutor

What it is

Threads vs processes

The three iteration patterns

Pool sizing

What this does not give

Primitives by language

Implementation

Key points

Follow-up questions

Gotchas

Related reading

concurrent.futures: ThreadPoolExecutor & ProcessPoolExecutor

What it is

Threads vs processes

The three iteration patterns

Pool sizing

What this does not give

Primitives by language

Implementation

Key points

Follow-up questions

Gotchas

Related reading