Spurious and Stolen Wakeups
A waiting thread can wake up without anyone calling notify (spurious), or be beaten to the lock by another thread that consumed the condition (stolen). The defence is to always check the condition in a while loop after wait returns, never if.
The two ways a wakeup can lie
A waiting thread sleeps on a condition variable until someone calls signal. Two surprises can happen between "the thread woke up" and "the thing it was waiting for is true."
Spurious wakeup. Nobody called signal. The thread woke up anyway. POSIX permits this. Java, Python, and Go condition variables all inherit the rule. The runtime is allowed to wake "just because" so that its own implementation can be cheap.
Stolen wakeup. Someone really did call signal because the queue had an item. The thread woke up. But before it could re-acquire the lock, another waiter beat it to the lock, took the item, and released. By the time the original has the lock, the queue is empty again.
Both look the same in code: the thread wakes up, the condition is false. Same fix: re-check it, in a loop, with the lock held.
A timeline that shows the bug
One producer, two consumers (T1 and T2), shared queue. Both consumers are already asleep on cv.wait() (the wait call atomically released the lock before sleeping). Producer uses notifyAll() so both consumers wake. Time flows top to bottom.
The race window is the gap between "wake" and "has the lock." T2 was woken legitimately, but by the time it actually holds the lock, the world has changed. The same shape happens with a single waiter that wakes spuriously: the wake is real, but the predicate that motivated the wait may no longer hold. The while loop is the only thing that protects against it.
What wait actually does
wait is three operations bundled into one atomic step:
1. release the held lock
2. park the thread (sleep)
3. on wakeup, re-acquire the lock
4. return
After step 4, the lock is held again. But nothing is known about what happened during steps 2 and 3. Other threads ran. State changed. The predicate must be re-checked.
That's the entire reason the while exists.
The pattern, in one shape
lock()
while (condition is not true):
wait(cond)
# now the condition is true AND the lock is held
do the work
unlock()
Three languages, four runtimes, same shape. Memorise this and both spurious and stolen wakeups are handled without thinking.
What does not need a while
A primitive that keeps an integer count internally (semaphore, channel, blocking queue) only wakes when there's actually something to take. There's no separate predicate to re-check.
sema.acquire() # blocks until count > 0; no while needed
ch <- # blocks until a value is sent; no while needed
queue.take() # blocks until non-empty; no while needed
The while is only for the case where the predicate is maintained by hand: a condition variable plus a flag, a counter, or a state field that gets checked manually.
How this shows up in interviews
Asked under several different names:
- "Why does
waitneed to be in a loop?" - "What is a spurious wakeup?"
- "Why does this producer-consumer code occasionally throw NoSuchElementException under load?"
All the same answer: between "wake up" and "do work" the world can change, so re-check.
The senior version: spurious wakeups exist because the runtime reserves the right to wake conservatively (cheaper to wake everyone than to track exactly who needs it). Stolen wakeups exist because the lock is released during sleep and another thread can win the race for it on the way out. The while handles both without distinguishing them.
Implementations
Looks reasonable. It is broken. After wait returns, the queue might be empty again because of a spurious wakeup or because another consumer grabbed the item. The if branch falls through and removeFirst runs on an empty queue.
1 // BROKEN
2 public T take() throws InterruptedException {
3 lock.lock();
4 try {
5 if (queue.isEmpty()) cond.await(); // ← if
6 return queue.removeFirst(); // may throw
7 } finally { lock.unlock(); }
8 }The while re-checks the predicate every time the thread wakes. If the wakeup was spurious, the loop puts the thread back to sleep. If another consumer beat it to the item, it goes back to sleep too. By the time it exits the loop, the predicate is true and the lock is held.
1 public T take() throws InterruptedException {
2 lock.lock();
3 try {
4 while (queue.isEmpty()) cond.await(); // ← while
5 return queue.removeFirst();
6 } finally { lock.unlock(); }
7 }Key points
- •Spurious wakeup: the OS or runtime wakes a waiting thread for no reason. Allowed by every POSIX-shaped wait API.
- •Stolen wakeup: someone signalled, the thread woke up, but another thread grabbed the resource before the first could re-acquire the lock.
- •Both are why every pthread / Java / Python tutorial says 'wait inside while, never if.'
- •wait/await releases the lock atomically, parks the thread, re-acquires the lock on wake. The re-check happens after re-acquire.
- •Channels and semaphores avoid the issue because they have an integer count, not a free-form predicate.