Race Conditions, Why Concurrent Bugs Happen
A race condition is when the program's correctness depends on the unpredictable interleaving of two or more threads' operations on shared state. Fix it by removing the sharing, or by making the operations atomic with a lock or atomic primitive.
What it is
A race condition is a bug whose outcome depends on the order in which threads' operations end up interleaving. Run the same input twice and the answer can differ. The program is no longer deterministic, and that's a correctness disaster.
The bug isn't "slow" or "occasionally weird." It's "sometimes returns the wrong answer, and it's not reproducible locally."
A picture of the classic race
Counter starts at 5. Two threads each run counter++. The OS interleaves the underlying load/add/store:
The bug is the gap between "load" and "store." Either thread can step inside that gap and corrupt the result. With a different schedule (rare on dev laptops, common at peak load), a different bad outcome surfaces.
Why it matters
Race conditions cause the worst class of production bugs:
- Tests pass. Unit tests run sequentially with predictable timing. Races need contention.
- Logs lie. The race interleaving usually doesn't get logged. Postmortem reads as "everything looks fine."
- They scale with load. A race that fires once per million ops is invisible at 100 req/sec, painful at 1M req/sec.
- They're worse on ARM than x86. Code that "works" on dev laptops can break on production servers.
The infamous Knight Capital incident A 2012 deployment race triggered an unused order-routing code path; an algorithm bought $7B of stock in 45 minutes. Knight lost $440M and went out of business. The race condition was in a deployment script, not the trading code itself.
How they happen, the three ingredients
Every race needs all three:
- Shared mutable state, a variable, struct field, map, or buffer that more than one thread can touch.
- Multiple threads, two or more execution units that can run "at the same time" (concurrent or parallel).
- Unsynchronized access, at least one writer, with no lock/atomic/channel/happens-before edge ordering the operations.
Remove ANY one and the race vanishes
- Make the state immutable → no shared mutable state.
- Confine the state to one thread → no multiple threads touching it.
- Add synchronization (lock, atomic, channel) → access is no longer unsynchronized.
Why counter++ is the textbook example
counter++ looks atomic. It isn't. The compiler emits three instructions:
- Load
counterfrom memory into a register. - Increment the register.
- Store the register back to memory.
Each of those three instructions is atomic on its own. The sequence of three is not. Any other thread can interleave between them — exactly the t1..t6 pattern in the diagram above. Two increments ran, but the counter went from 5 to 6, not 7. One update was lost.
This read-modify-write pattern is everywhere: incrementing counters, appending to lists, updating cache entries, updating linked-list pointers. All of them are races without synchronization.
How to fix them
The fix hierarchy
- Eliminate sharing, best. Use immutable data, copy-on-write, or per-thread state with a final merge. No race possible.
- Confine state, second best. One thread owns the data; others communicate via channels/queues. ("Share by communicating.")
- Atomic primitive, for single-variable updates.
AtomicInteger,sync/atomic,threading.Lock-wrapped read-modify-write. - Lock the critical section, for multi-step invariants. Cheapest mental model, costliest at runtime.
How to find them before they bite
- Race detector:
go run -race, ThreadSanitizer (clang/gcc), Java'sjcstress. Slow in CI, priceless. - Property-based tests: run the operation N times concurrently, assert invariants.
- Code review checklist: every shared mutable field, is it
volatile/atomic, locked, or confined?
"It works on my machine" Race conditions love x86 (strong memory model) and break on ARM (relaxed memory model). The same code can pass on Intel CI and fail on Apple Silicon laptops. Test on both.
Implementations
counter++ reads the value, increments, writes back, three instructions. Two threads can both read the same value, both increment, both write, and one update vanishes. This is the textbook race condition.
1 class Counter {
2 int value = 0; // shared mutable state
3 void inc() { value++; } // race
4 }
5
6 Counter c = new Counter();
7 // 1000 threads each call c.inc() 1000 times
8 // Expected: 1,000,000. Actual: less.synchronized enforces that only one thread executes the method body at a time, AND establishes happens-before so the increment is visible to other threads.
1 class Counter {
2 int value = 0;
3 synchronized void inc() { value++; }
4 }AtomicInteger.incrementAndGet() is a single CAS-based operation, no lock, no contention storm, faster under load.
1 class Counter {
2 AtomicInteger value = new AtomicInteger(0);
3 void inc() { value.incrementAndGet(); }
4 }Key points
- •A race condition is a CORRECTNESS bug, not a 'maybe slow' bug, the wrong answer can be observed
- •Three ingredients: shared mutable state, multiple threads, unsynchronized access
- •Remove ANY one of the three and the race goes away
- •Read-modify-write operations like counter++ are NEVER atomic, they're three instructions
- •Compilers and CPUs reorder reads/writes, what looks sequential in source isn't
- •Race conditions hide on x86 (strong memory model) and explode on ARM
Follow-up questions
▸What's the difference between a data race and a race condition?
▸Why does counter++ fail even with volatile?
▸If code 'works' on x86, is there still a race?
▸What's the simplest way to remove a race?
▸Are reads-only races okay?
Gotchas
- !Tests almost always pass, races need real load and timing to manifest
- !x86 hides races that ARM exposes, test on both
- !Even atomic.Add doesn't help with check-then-act: if (atomic.Load() == 0) atomic.Store(1) is racy
- !The Go race detector slows code 5-10x, run it in CI, not production
- !Java's volatile is for visibility, not atomicity of compound ops
Common pitfalls
- Adding a lock around the wrong thing, protecting the read but not the write, or vice versa
- Using a different lock instance for related operations, looks locked, isn't
- Assuming int64 reads/writes are atomic, they're not on 32-bit platforms or under reordering
Practice problems
Lock-based, atomic, and per-thread accumulation with final merge. Compare contention behavior.
APIs worth memorising
- Java: AtomicInteger, AtomicLong, LongAdder, synchronized, ReentrantLock
- Python: threading.Lock, threading.RLock, queue.Queue (avoids the race)
- Go: sync/atomic, sync.Mutex, go run -race
Race conditions cause the worst production incidents. Knight Capital's $440M loss (2012) was a deployment race. The CSRF Same-Site cookie patch in Chrome had a race that broke logins for some users. Every postmortem at scale eventually mentions a race condition.