Wait-Free vs Lock-Free vs Obstruction-Free
Three progress guarantees. Wait-free: every thread completes in bounded steps regardless of others. Lock-free: at least one thread makes progress at any time (system-wide progress). Obstruction-free: a thread makes progress if it runs alone for long enough. Strict ordering: wait-free > lock-free > obstruction-free > blocking.
Imagine a running track
Picture threads as runners on a track. Each runner is trying to finish a single lap, where a lap is one operation on a shared data structure (a push, a pop, a counter increment). The four progress levels are different promises about who is guaranteed to cross the finish line.
| Level | Promise | What it means in plain words |
|---|---|---|
| Wait-free | Every runner finishes in a bounded number of steps. | No matter how the other runners behave, each runner crosses the line within a fixed amount of time. Hardest to design. |
| Lock-free | At least one runner is always making progress. | Somebody is always finishing laps. Individual runners may keep losing the race and retrying forever, but the system never stalls. |
| Obstruction-free | A runner finishes if she gets the track to herself for long enough. | If two runners keep tripping each other, neither one finishes. Nobody is blocked, but nobody makes progress either. This is livelock. |
| Blocking (lock-based) | A runner holding the baton can stop running mid-lap and everyone behind waits. | If the holder pauses for any reason (GC, OS preemption, slow disk, page fault), every other runner on that lap freezes until the holder resumes. |
The four levels form a strict hierarchy from strongest to weakest: wait-free → lock-free → obstruction-free → blocking. Each step down the ladder gives a weaker progress promise but is easier to build.
Why lock-free is the practical bar
Lock-free is the level that almost every production library labelled "lock-free" actually delivers. The shape of the code is the CAS retry loop: read the current value, compute the next value, try to install it with a compare-and-swap, and on failure go back to the start. At any moment, exactly one thread's CAS succeeds and the system moves forward by one step. The threads that lose just retry.
The reason this is enough for production code is that the system is never stuck behind a single thread. If a thread that just ran a successful CAS gets suspended right after (a GC pause, a page fault, a syscall), the suspension does not block anybody else, because the suspended thread is not holding a lock. Some other thread keeps winning CAS races and the data structure keeps advancing.
The cost is that an individual thread can lose its CAS many times in a row. The contract is "system-wide progress", not "per-thread fairness". For nearly every production system that wants to avoid lock holders pausing everyone else, this is the right trade.
Why wait-free is so much harder
A plain CAS retry loop is not wait-free. A single thread can keep losing forever if it has bad luck and faster threads keep winning. To make every thread finish in a bounded number of steps, three extra pieces are usually needed:
- Operation descriptors. Every operation a thread wants to do is first published as a small record in shared memory. The record says "this thread wants to do this operation on this data structure". Other threads can see it.
- Helping. Before doing its own work, a thread checks whether another thread has an operation in progress. If so, the current thread completes that other thread's operation first, then does its own. The slow thread, when it finally resumes, sees that its work is already done and returns immediately.
- Generation counters. To distinguish operations that are still pending from operations that have already been completed (perhaps by a helper). Without this, the same operation can get "helped" twice and produce a wrong result.
These three pieces working together are what turns "somebody is always making progress" into "every thread finishes within a fixed bound". The famous Kogan-Petrank wait-free queue (2011) is around five to ten times more code than the equivalent lock-free Michael-Scott queue, with substantially worse constant factors. Production systems almost never pay that cost; lock-free with backoff is the practical answer.
A picture of helping
The difference between lock-free and wait-free, in one comparison: what happens to a slow thread under sustained contention.
Helping is the trick that turns "somebody makes progress" into "everybody makes progress". It is also the reason wait-free code is so much heavier: every operation has to be expressible as a descriptor, every thread has to check for and finish other threads' descriptors before its own work, and the bookkeeping has to handle the case where an operation is already done by the time the original thread looks.
What "lock-free" means in the wild
Most libraries that call themselves lock-free really are lock-free in the formal sense. Some are looser, using the term to mean "does not call pthread_mutex_lock". Two specific things are worth checking before trusting the label:
- Spin-locks are not lock-free. A library that internally uses a spin-lock is blocking. If the spin-lock holder pauses (GC, page fault, preemption), every caller spinning on the lock stalls. The fact that there is no mutex object in the API does not change this.
- Allocator dependence. A lock-free data structure that calls
mallocon every push is only lock-free if the allocator is also lock-free. Production libraries like Folly and libcds are careful about this and either use lock-free allocators or pre-allocate. Ad-hoc lock-free implementations often skip the check and accidentally inherit the allocator's blocking behaviour.
When wait-free is actually needed
The set of cases where wait-free is genuinely worth the constant-factor cost is small.
- Hard real-time systems. Avionics, automotive control, medical devices, anything where a missed deadline is unacceptable. Every thread has to finish in a bounded amount of time and lock-free's "individual threads may starve" property is not OK.
- Signal handlers and similar restricted contexts. Some operating system contexts forbid blocking entirely (signal handlers on POSIX, certain interrupt handlers, parts of garbage collectors). Wait-free is one safe option here, though lock-free with bounded retries usually works too.
- Adversarial schedulers. Environments where one thread can be preempted arbitrarily for arbitrarily long periods (cooperative schedulers, virtualised environments, JVM safepoints). Wait-free is the only level that protects against unbounded starvation under adversarial scheduling.
For everything else, lock-free is the practical target, and for most application code even lock-free is over-engineering. Mutexes are simpler, often faster under low contention, and almost always correct on the first try.
The practical rule
Use locks for application code. Reach for a lock-free library once measurement shows lock contention is the actual bottleneck. Do not write custom lock-free data structures without a research-grade reason and a stress-test suite (jcstress for the JVM, loom for Rust, TSan for C/C++). Wait-free is for hard real-time and signal contexts; almost no application code needs it.
Implementations
Obstruction-free algorithms make progress when running alone. Combined with exponential backoff on contention, threads spread out, eventually run alone, and complete. Used in some software transactional memory implementations.
1 // Pseudo-code: obstruction-free algorithm with backoff
2 void update(int newVal) {
3 int backoffMs = 1;
4 while (true) {
5 if (tryUpdate(newVal)) return; // succeeds if no other thread interferes
6 try {
7 Thread.sleep(ThreadLocalRandom.current().nextInt(backoffMs));
8 } catch (InterruptedException e) { return; }
9 backoffMs = Math.min(backoffMs * 2, 100);
10 }
11 }Key points
- •Wait-free: hardest. Bounded per-thread steps. No starvation possible.
- •Lock-free: at least one thread makes progress always. Individual threads may starve.
- •Obstruction-free: a thread makes progress if no other thread interferes. Helped by exponential backoff.
- •Blocking (lock-based): a thread holding a lock can pause everyone else indefinitely.
- •Most production 'lock-free' code is actually lock-free, not wait-free. Wait-free is a research goal; lock-free is the practical bar.
Follow-up questions
▸Why is wait-free so much harder than lock-free?
▸Is lock-free always faster than locks?
▸What is helping?
▸When would I actually want wait-free?
Gotchas
- !'Lock-free' marketing often means 'doesn't block on a mutex' rather than the formal definition
- !Lock-free does NOT prevent individual thread starvation
- !Wait-free algorithms have bad constant factors; rarely worth it in practice
- !Obstruction-free without backoff = livelock under contention
- !Blocking algorithms inside a lock-free data structure (e.g., malloc) defeat the guarantee