Race Detection in CI
Run race detectors as part of CI. Go: -race. C++/Rust: ThreadSanitizer (TSan). Java: jcstress for memory model assumptions, plus quality concurrent tests. The detectors find concurrent unprotected access regardless of platform; they catch the bugs hidden by x86's strong ordering before they hit ARM in production.
What it is
Race detectors are tools that instrument the program to detect data races at runtime: two threads accessing the same memory without synchronisation, where at least one is a write. They catch a specific bug class (data races) that's responsible for many flaky-test and "works on x86, fails on ARM" stories.
The mainstream detectors:
- Go:
-raceflag. Built in. Catches data races during test runs. - C++: ThreadSanitizer (TSan). Built into Clang and GCC.
-fsanitize=thread. - Rust: TSan via
RUSTFLAGS="-Zsanitizer=thread". The type system catches most races at compile time anyway. - Java: no general race detector (the JMM allows races as defined behaviour). Use
jcstressfor memory-model assumption tests; standard JUnit for higher-level concurrency tests. - Python: no general race detector (the GIL hides most). Stress testing is the practical approach.
How they work
The standard algorithm is happens-before tracking via vector clocks. Each memory location has a vector clock; each access updates the location's clock to be at least the accessing thread's clock; if a new access has a clock that doesn't dominate the prior one's, that's a race.
The implementation cost: 2-10x slowdown, 5-10x more memory. Acceptable for CI; not for production.
Where to run them
CI. Every PR. Run the test suite with race detection enabled. Failures block merge.
If the test suite is too slow with -race (long-running integration tests, large parameter sweeps), split into a fast race-detected subset (per PR) and a full suite (nightly).
What they do and don't catch
Catch:
- Concurrent unprotected access to a variable.
- Memory ordering bugs (writes visible without synchronisation).
- Wrong use of atomics with non-atomic neighbours.
Don't catch:
- Deadlocks (no progress = no race).
- Livelocks (lots of progress = lots of races but all on different memory).
- Missing notifications / lost wakeups.
- Atomicity violations across multiple atomics.
- Bugs that didn't happen during the test (test didn't trigger the race).
The race detector is necessary, not sufficient. Combine with stress tests that exercise concurrent paths at high load; the combination catches more.
Stress testing
For Go: high-N goroutine tests that perform many operations concurrently. Run repeatedly under -race to maximise the chance of triggering races.
go test -race -count=100 -run=TestConcurrent
For Java: concurrent JUnit tests with N threads doing M operations. Use jcstress for the lower-level memory-model fixtures.
For Python: pytest-xdist or pytest --count for repeated runs. Fewer races to find (GIL helps), but multi-bytecode operations can still race.
The race detector + stress test combination is the standard for catching concurrency bugs before production.
Race detection in CI is non-negotiable for any concurrent code: cheap, automatic, catches a major bug class. It is not a complete safety net, so combine it with stress tests, code review, and deadlock analysis. And remember it is a development-time tool; production runs un-instrumented, so the catch has to happen before ship.
The race detector finds the bugs that hide on x86 and surface on ARM. Every team should be running it.
Implementations
jcstress runs tiny concurrent test fixtures millions of times in different JVMs and with different scheduling, looking for unexpected outcomes. The gold standard for testing lock-free Java code.
1 // pom.xml: depend on org.openjdk.jcstress:jcstress-core
2
3 @JCStressTest
4 @Outcome(id = "1, 1", expect = Expect.ACCEPTABLE, desc = "both see the update")
5 @Outcome(id = "0, 0", expect = Expect.ACCEPTABLE_INTERESTING, desc = "neither sees")
6 @Outcome(id = "0, 1", expect = Expect.FORBIDDEN, desc = "should never happen")
7 @State
8 public class PublishTest {
9 int data;
10 volatile boolean ready;
11
12 @Actor
13 public void writer() {
14 data = 1;
15 ready = true;
16 }
17
18 @Actor
19 public void reader(IntResult2 r) {
20 r.r1 = ready ? 1 : 0;
21 r.r2 = data;
22 }
23 }
24
25 // Run: mvn package; java -jar target/jcstress.jar
26 // Output: per-fixture, count of each observed outcome.
27 // FORBIDDEN outcomes that occur are bugs.Key points
- •Go: go test -race covers most data races. Run in CI on every PR.
- •C++/Rust: ThreadSanitizer (TSan). Compile with -fsanitize=thread, run tests.
- •Java: jcstress for memory-model bugs (lock-free code). Standard JUnit for higher-level concurrency.
- •Race detectors slow execution 2-10x and use more memory; run in CI, not production.
- •Stress-test concurrent code with high goroutine/thread counts; race detector + load surfaces real bugs.
Follow-up questions
▸Should the race detector be on for production?
▸How does Go's race detector compare to TSan?
▸Does the race detector catch all bugs?
▸What about Rust?
Gotchas
- !Not running -race / TSan in CI = bugs hide until ARM production
- !Tests that don't actually exercise concurrency: race detector reports nothing
- !Long test suites with -race timeout because of 10x slowdown; tune timeouts
- !Race-free does not mean correct; algorithmic concurrency bugs (deadlock, livelock) still possible
- !Skipping race detection on 'performance' tests: those are exactly where races hide