Multiprocessing Deep Dive
multiprocessing spawns separate Python processes to escape the GIL. Pool, Process, Queue, Pipe, Manager, shared memory. The catch: every argument and return value is pickled across the boundary, start methods (fork/spawn/forkserver) behave very differently, and shared state requires explicit machinery.
What it is
multiprocessing is the standard library's tool for running Python code in separate processes. It exists for one main reason: to escape the GIL.
Each process has its own Python interpreter, its own GIL, its own memory. CPU-bound Python code that does not benefit from threads scales linearly across cores with processes (up to the per-process overhead of creation, IPC, and pickling).
ProcessPoolExecutor (in concurrent.futures) is built on top of multiprocessing.Pool. Reach for the executor first; reach for raw multiprocessing only for capabilities the executor doesn't expose (custom IPC, Manager, shared memory, start method control).
The pickling tax
Because processes do not share memory, every argument to a worker and every return value is pickled (serialised), sent over a pipe, and unpickled in the worker. The cost is real:
- Picklable types only. Lambdas fail. Closures over local variables fail. File handles, sockets, threading.Lock fail. Anything with a
__reduce__issue fails. - Big arguments (a 100MB numpy array) take real time to serialise and copy.
- For small operations, pickling overhead can exceed the work, making multiprocessing slower than sequential.
The escape hatches: shared memory for big buffers (no copy), Manager for shared mutable state (every access is IPC, slow), or simply refactor to do more work per task.
Start methods
multiprocessing has three ways to create a worker process: fork, spawn, forkserver. Choosing matters more than people realise.
fork (Unix default historically; deprecated as the default starting Python 3.14): the child gets a copy-on-write copy of the parent's memory. Fast (no re-imports, no fresh interpreter). But it inherits everything: open file descriptors, threads, locks, library state. Many libraries (boto3, urllib3, opencv) explicitly warn against fork after threading because the child inherits a thread mid-flight.
spawn (Windows always; macOS default in 3.8+): the child starts a fresh interpreter and re-imports the module. Clean, no inherited state, no fork-after-threading bugs. Slow (hundreds of ms per worker). Requires if __name__ == "__main__": guards on top-level code.
forkserver: a small server process is forked once at startup. Subsequent workers fork from it. Fast like fork, clean like spawn (the server has no library state). This becomes the recommended default on Linux from 3.14 onward.
For new code, prefer spawn or forkserver. For legacy code on Linux that already works, fork is fine for now, but be aware of the threading rule and the upcoming default change.
Sharing state
The default model is "no shared state". Workers communicate via Queue, Pipe, or pickled return values. This avoids most concurrency bugs: there is nothing to race on.
When shared state is genuinely required, the options are:
Manager: a server process owns the shared object. Other processes get proxies. Every access (read or write) is an IPC call. Convenient (Manager.dict, Manager.list, Manager.Lock all just work) but slow.
shared_memory: an OS-level memory region mapped into every process. Zero copy. The right tool for big numpy arrays, image buffers, or any fixed-size byte structure. Synchronisation is the caller's responsibility.
Value / Array: small fixed-size shared variables (one int, one float, an array of doubles). With a Lock for synchronisation. Useful for counters, flags.
For most workloads, shared state can be avoided entirely by passing data through Queue or by chunking work and combining results in the parent. Reach for shared state only after measurement shows IPC cost dominates.
When the GIL is gone
PEP 703 (no-GIL CPython) is shipping as a build option in 3.13+ and is on track to become default later this decade. When it lands, threads achieve true CPU parallelism in Python without processes.
multiprocessing does not become obsolete. It still provides fault isolation (one crashed worker does not take the parent), memory isolation (no shared mutable state by default), and the ability to use multiple machines via process-based libraries. But for "just use the cores", threads will become a viable option without the multiprocessing tax.
Primitives by language
- multiprocessing.Pool (worker pool, like ProcessPoolExecutor)
- multiprocessing.Process (single subprocess)
- multiprocessing.Queue / Pipe (IPC channels)
- multiprocessing.Manager (shared dict/list/Lock proxied across processes)
- multiprocessing.shared_memory (zero-copy bytes/numpy)
- Start methods: fork, spawn, forkserver
Implementation
Pool's API mirrors itertools. map preserves order and waits for all. imap_unordered streams results in completion order, which makes it possible to process the first finished result without waiting for slowpokes. For long-running batches, imap_unordered is often the right default.
1 from multiprocessing import Pool
2
3 def heavy(x):
4 # CPU-bound; releasing GIL would not help
5 return sum(i * i for i in range(x))
6
7 if __name__ == "__main__": # required on spawn (Windows)
8 with Pool(processes=4) as pool:
9 # Order-preserving
10 for r in pool.map(heavy, range(100)):
11 print(r)
12
13 # Completion order, lower latency to first result
14 for r in pool.imap_unordered(heavy, range(100), chunksize=10):
15 print(r)Manager runs a server process that owns the shared object. Other processes get proxies. Every access is an IPC call (pickle, send, unpickle, return). Convenient but slow. For high-throughput shared state, prefer shared_memory.
1 from multiprocessing import Manager, Process
2
3 def worker(shared_dict, key, value):
4 shared_dict[key] = value # proxied: pickled, sent, applied
5
6 if __name__ == "__main__":
7 with Manager() as mgr:
8 shared = mgr.dict()
9 procs = [
10 Process(target=worker, args=(shared, i, i * i))
11 for i in range(10)
12 ]
13 for p in procs: p.start()
14 for p in procs: p.join()
15 print(dict(shared)) # snapshotshared_memory (Python 3.8+) gives multiple processes a view onto the same OS-level memory region. No pickling, no copying. The right tool for big numpy arrays shared across workers.
1 from multiprocessing import shared_memory, Process
2 import numpy as np
3
4 def worker(name, shape, dtype):
5 shm = shared_memory.SharedMemory(name=name)
6 arr = np.ndarray(shape, dtype=dtype, buffer=shm.buf)
7 arr[:] = arr * 2 # in-place, visible to all
8 shm.close() # close handle, do not unlink
9
10 if __name__ == "__main__":
11 base = np.arange(1_000_000, dtype=np.int64)
12 shm = shared_memory.SharedMemory(create=True, size=base.nbytes)
13 arr = np.ndarray(base.shape, dtype=base.dtype, buffer=shm.buf)
14 arr[:] = base
15
16 p = Process(target=worker, args=(shm.name, base.shape, base.dtype))
17 p.start()
18 p.join()
19
20 print(arr[:5]) # doubled
21 shm.close(); shm.unlink() # creator unlinksfork: child gets a copy-on-write copy of the parent's memory. Fast. Default on Linux. But: if the parent had threads or held locks, the child inherits half-state. Many libraries (boto3, urllib3) caution against fork after threading. spawn: child starts a fresh interpreter and re-imports the module. Slow but clean. Default on Windows and macOS in 3.8+. forkserver: a small server process is forked early; subsequent workers fork from it (no parent state inherited).
1 import multiprocessing as mp
2
3 # Force a specific start method (must be done at program start)
4 if __name__ == "__main__":
5 mp.set_start_method("spawn") # or "fork", or "forkserver"
6
7 # Pool created here uses the chosen method
8 with mp.Pool(4) as pool:
9 pool.map(work, items)
10
11 # Pickling failure under spawn: this lambda will fail because spawn re-imports
12 # and the lambda is not importable.
13 # pool.map(lambda x: x * 2, items) # BAD on spawnKey points
- •Each process has its own Python interpreter and its own GIL. True CPU parallelism.
- •Arguments and return values must be picklable. Lambdas, closures, file handles, locks all fail.
- •Start method matters: fork (Unix default, fast, copy-on-write) vs spawn (Windows, slower, fresh interpreter) vs forkserver.
- •Shared state needs Manager (proxied, slow) or shared_memory (fast, raw bytes).
- •multiprocessing.Pool has the same map/imap/apply API as concurrent.futures' ProcessPoolExecutor; the latter is usually preferred.
Follow-up questions
▸When is multiprocessing preferable over ProcessPoolExecutor?
▸Why does code work on Linux but fail on macOS / Windows?
▸Manager.dict vs shared_memory: when to pick which?
▸How does PEP 703 (no-GIL Python) change this?
Gotchas
- !Forgetting `if __name__ == '__main__':` on Windows/macOS spawn causes infinite recursion
- !Lambdas, local functions, and closures cannot be pickled; use module-level callables
- !Manager proxies are SLOW; every read or write is an IPC round trip
- !fork after threading is undefined behaviour in many libraries; prefer spawn or forkserver
- !Forgetting shm.unlink() leaks shared memory until reboot on some platforms