PEP 703: No-GIL Python
PEP 703 makes the GIL optional. Free-threaded CPython (3.13+ build option, supported but not default in 3.14) lets multiple Python threads run bytecode in parallel. Threads finally help CPU-bound work. Cost: per-object reference counting needs atomics, single-threaded performance regresses ~10-15%, C extensions need rebuilding.
What it is
PEP 703 ("Making the Global Interpreter Lock Optional in CPython") is the project to remove the GIL from CPython. The first user-visible result shipped in Python 3.13 (October 2024) as an experimental build option called free-threaded CPython, with the binary python3.13t.
The GIL existed because CPython's reference counting was not thread-safe. Removing the GIL meant making refcounts atomic, making the cycle collector thread-safe, making per-object operations safe under concurrent access. That work is done. The result: threads can run Python bytecode in parallel.
What changes
For pure Python code that already uses locks correctly, very little. The semantics of threading.Lock are unchanged. The behaviour of Queue, Event, Condition is unchanged. What changes is that two threads can actually be running Python at the same instant on different cores.
For CPU-bound workloads, this changes everything. Threaded code that previously got near-zero speedup from extra threads now scales linearly (within the limits of contention). The traditional advice "reach for multiprocessing for CPU-bound work in Python" stops being necessary.
What does not change
Race conditions. The GIL's accidental side effect was that single bytecodes were atomic, which papered over many sloppy programs. Without the GIL, code that assumed counter += 1 was safe is more obviously broken. The bugs were always there; now they happen sooner.
Multiprocessing. It still wins on fault isolation (one worker crash does not take the parent), memory isolation (no risk of one process corrupting another's state), and scaling across machines. Free-threaded Python competes for the "use multiple cores in one process" case, not for everything multiprocessing does.
The cost
Single-threaded performance regresses. In Python 3.13, the free-threaded build is roughly 10-15% slower than standard CPython on single-threaded benchmarks. The overhead comes from atomic reference counting (every Py_INCREF and Py_DECREF is now an atomic op) and from biased locking on object headers (mostly free for the bias-owning thread, expensive for others).
The PEP 703 team has roadmaps to close that gap. Expect incremental improvements in subsequent releases; watch the python-dev release notes for the actual numbers.
C extension compatibility
This is the hard part. A C extension that mutates module-global state without locks was relying on the GIL to serialise access. On free-threaded Python, that mutation can race.
The mechanism: each C extension declares whether it is no-GIL safe via the Py_mod_gil module slot. When the runtime loads an extension that does not declare safety, it re-enables the GIL for the whole process and prints a warning.
In practice that means: until every depended-on C extension (numpy, torch, lxml, cryptography, pillow, etc.) has been audited and updated, the free-threaded process runs with the GIL on anyway. As of late 2025, the major libraries are mid-migration. Check each library's release notes.
What to do today
For new pure-Python projects: write thread-safe code (lock shared mutable state) so the codebase is ready when free-threaded Python is the default.
For existing CPU-bound Python on multiprocessing: keep using multiprocessing. The migration to threads will be smoother once free-threaded Python is the default and dependencies are no-GIL safe.
For C extension authors: read the porting guide, declare safety via Py_mod_gil, audit globals, ship updates.
For everyone: set up a test environment with python3.13t, run the tests, see what breaks. The bugs surfaced will be real bugs, regardless of whether free-threaded Python ever ships in production.
Primitives by language
- Free-threaded build (python3.13t)
- sys._is_gil_enabled() (runtime check)
- PYTHON_GIL=0 / -X gil=0 (disable at runtime if both supported)
- Py_mod_gil slot (C extension declares no-GIL safety)
Implementation
The same Python code runs on both builds. sys._is_gil_enabled() detects at runtime. Useful for benchmarks, library compatibility checks, and conditional behaviour.
1 import sys
2
3 if hasattr(sys, "_is_gil_enabled"):
4 print("GIL enabled?", sys._is_gil_enabled())
5 else:
6 print("Old Python (pre-3.13), GIL is always on")
7
8 # Output on free-threaded 3.13t with GIL disabled:
9 # GIL enabled? FalseOn standard CPython this code gets near-zero speedup from threads (the GIL serialises). On free-threaded CPython, real parallelism shows up: 4 threads on 4 cores ~ 4x speedup for CPU-bound work.
1 import threading
2 import time
3
4 def cpu_bound(n):
5 total = 0
6 for i in range(n):
7 total += i * i
8 return total
9
10 def run_parallel(n_threads, work_per_thread):
11 threads = [
12 threading.Thread(target=cpu_bound, args=(work_per_thread,))
13 for _ in range(n_threads)
14 ]
15 start = time.perf_counter()
16 for t in threads: t.start()
17 for t in threads: t.join()
18 return time.perf_counter() - start
19
20 # Standard CPython (GIL on): 4 threads ~ same time as 1 thread
21 # Free-threaded CPython: 4 threads ~ 1/4 the time of 1 thread
22 print(f"4 threads: {run_parallel(4, 50_000_000):.2f}s")Without the GIL, the threading bugs that were hidden by GIL-induced serialisation become visible. This always-broken code is more obviously broken now: counter += 1 races and loses updates without a lock.
1 import threading
2
3 counter = 0
4 lock = threading.Lock()
5
6 def increment_unsafe():
7 global counter
8 for _ in range(100_000):
9 counter += 1 # races without GIL!
10
11 def increment_safe():
12 global counter
13 for _ in range(100_000):
14 with lock:
15 counter += 1 # always correct
16
17 # Without lock on free-threaded Python: counter is much less than expected
18 # With lock: counter exactly equals 4 * 100_000
19 threads = [threading.Thread(target=increment_safe) for _ in range(4)]
20 for t in threads: t.start()
21 for t in threads: t.join()
22 print(counter)A C extension compiled for standard CPython does not declare no-GIL safety. When loaded into free-threaded Python, the runtime re-enables the GIL for the whole process and prints a warning. Once every loaded extension declares Py_mod_gil = Py_MOD_GIL_NOT_USED, the GIL stays off.
1 // In a C extension's module init:
2 // (Just the module-init slot, not the full module setup)
3
4 static PyModuleDef_Slot mymodule_slots[] = {
5 {Py_mod_gil, Py_MOD_GIL_NOT_USED}, /* declare safe */
6 {0, NULL}
7 };
8
9 static struct PyModuleDef mymoduledef = {
10 PyModuleDef_HEAD_INIT,
11 "mymodule",
12 NULL, 0, NULL,
13 mymodule_slots,
14 NULL, NULL, NULL,
15 };
16
17 // To check at runtime in Python:
18 // python3.13t -X gil=0 -c "import mymodule"
19 // If the extension is unsafe, Python re-enables GIL and warns.Key points
- •Free-threaded Python (3.13+) builds without the GIL. Threads achieve true parallelism.
- •Only available in a separate build (python3.13t). Default CPython still has the GIL.
- •C extensions must opt in via Py_mod_gil. Old extensions trigger a re-enabling of the GIL.
- •Single-threaded perf regresses (~10-15% in 3.13) due to atomic reference counting and biased locking on objects.
- •Multi-threaded scaling on CPU-bound code finally works without multiprocessing.
Follow-up questions
▸Should one switch to free-threaded Python today?
▸Does this make multiprocessing obsolete?
▸Will existing threaded code break?
▸What changes for library authors?
Gotchas
- !Single-threaded perf regresses ~10-15% on free-threaded build (atomic refcounts, biased locking overhead)
- !Loading any GIL-required C extension re-enables the GIL process-wide
- !Race conditions hidden by GIL serialisation become visible; correctness audits matter
- !Free-threaded build is python3.13t (separate binary); pip install wheels are limited
- !Some libraries gate behaviour on sys.version; check for free-threaded build separately