Java Memory Model & Happens-Before
The JMM defines what reads can observe what writes across threads. The happens-before relation is the rule set, if action A happens-before action B, then A's effects are visible to B.
The short version
The Java Memory Model (JMM) is the rulebook that tells the JVM, the JIT, and the CPU what reads of shared variables are allowed to return. Without it, the system is free to reorder writes for performance, and one thread's writes can appear to another thread in any order, or not at all.
The JMM offers one core idea: happens-before. If action A happens-before action B, then everything A wrote is guaranteed to be visible to B. If there's no happens-before edge between two actions, the JMM offers nothing, no visibility, no ordering, no guarantees.
That's the entire game. Build code so that every depended-on write has a happens-before edge to every read of it.
What "happens-before" really means
The cleanest mental model is a publish-then-subscribe contract between two threads. The writer publishes by doing one of a small set of special operations (a volatile write, a lock release, a channel send). The reader subscribes by doing the matching operation (a volatile read of the same field, a lock acquire of the same lock, a channel receive). The contract is: everything the writer did before publishing is guaranteed to be visible to the reader after subscribing.
A picture with two unrelated plain fields, data and config, and one volatile flag ready between them.
Without that publish-subscribe pair, the reader could see data = 0 and config = null even after the writer "obviously" set them. The compiler is allowed to reorder writes within the writer thread because the local single-thread behaviour does not change. The CPU is allowed to delay writes in a per-core store buffer because, again, the local thread sees its own writes in order. Neither reordering matters in single-threaded code, but both wreck cross-thread reads unless a happens-before edge forces the order.
The four edges that cover almost everything
The JMM defines a small set of operations that produce happens-before edges. Four of them cover almost every real situation.
| Edge | What it says |
|---|---|
| Program order | Within a single thread, statement N happens-before statement N+1. |
| Monitor (lock) | unlock(m) happens-before every later lock(m) on the same monitor. |
| Volatile | A write to v happens-before every later read of v that observes the new value. |
| Thread start and join | Thread.start() happens-before any action in the started thread. Any action in a thread happens-before another thread's successful join() return. |
Inside one thread, the program-order edge chains every statement to the next. Across threads, the lock, volatile, or start/join edges stitch threads together by connecting one thread's publish to another thread's subscribe.
The JMM then takes the transitive closure of all those edges. If A happens-before B, and B happens-before C, then A happens-before C, even if A and C are in different threads. This is what makes a real lock-and-write pattern actually work: the writer thread chains its plain writes to the unlock by program order, the unlock chains to the next lock by the monitor edge, and the next lock chains to the reader's plain reads by program order. End to end, every plain write the writer did is now visible to every plain read the reader does inside the matching lock.
What volatile actually does
A volatile field gives two guarantees in one keyword:
- Atomic reads and writes for that field. No torn values, even for 64-bit
longanddoublefields that are otherwise allowed to tear on 32-bit JVMs. A reference field never appears half-written. - A happens-before edge across the access. Everything the writer did before a volatile write is visible to any thread that reads the same volatile field afterward and sees the new value.
volatile is the cheapest happens-before edge in the language. One keyword, no lock object, no contention, no blocking. It is the right tool for write-once-then-read-many patterns: status flags, lazily-published references, "is the cache initialised yet" gates, the guard in a double-checked-locking singleton.
What volatile does not do is make compound operations atomic. volatile int counter; counter++; is still a race, because ++ is a read-modify-write sequence of three steps: load the value, add one, store it back. Two threads can both read the same value, both add one, both store, and one increment is silently lost. The fix is AtomicInteger.incrementAndGet() for low contention or LongAdder.increment() for high contention.
What synchronized does that volatile doesn't
synchronized and volatile both produce a happens-before edge. The difference is what else they give.
| Guarantee | synchronized | volatile |
|---|---|---|
| Happens-before edge across the operation | yes | yes |
| Mutual exclusion (only one thread inside at a time) | yes | no |
| Reentrant (same thread can re-enter) | yes | not applicable |
| Protects multi-field invariants | yes | no |
Use synchronized (or ReentrantLock) when more than one field has to change together, or when a check-then-act sequence has to be atomic against other threads. Use volatile when a single field is enough and no other thread needs to be excluded.
final fields are special
Once a constructor completes normally, any thread that obtains a reference to the constructed object is guaranteed to see its final fields fully initialised, even with no synchronization. This is called the final field freeze. It is why deeply immutable objects (records, frozen value classes, String, BigDecimal) are safe to share across threads with nothing more than a plain reference handoff.
The freeze has one important catch: it only applies to objects whose construction has actually finished. If the constructor leaks this mid-construction (registers itself as a listener, stores itself in a static field, hands itself to another thread before the constructor returns), other threads can observe the partially-built object before the freeze takes effect, and the guarantee is lost. The rule is: do not let this escape the constructor.
How Python and Go compare
Python has no formal memory model. CPython relies on the GIL: only one thread runs Python bytecode at a time, so each individual bytecode is effectively atomic and simple reads and writes of single fields do not tear. Reordering and cross-thread visibility of more elaborate operations are implementation-defined, and they differ between CPython, PyPy, and free-threaded CPython. The portable advice is to synchronize explicitly with threading.Lock, threading.Event, or queue.Queue. Treat anything else as undefined.
Go has a smaller and tighter memory model than Java. Three edges cover almost everything:
| Edge | What it says |
|---|---|
| Channel | A send on a channel happens-before the matching receive completes. |
| Mutex | A Mutex.Unlock happens-before the next Mutex.Lock on the same mutex. |
| Atomic | An atomic Store happens-before any later atomic Load that sees the stored value. |
Anything outside those three rules is a data race, and Go's -race flag in the runtime will catch it during testing. The rule of thumb: if two goroutines share data, the writer and reader have to synchronize through a channel, a mutex, or an atomic. There is no volatile-like keyword in Go.
Picking the right primitive
| Problem | Java | Go | Python |
|---|---|---|---|
| Single boolean flag | volatile | atomic.Bool | threading.Event |
| Hot counter | LongAdder | atomic.Int64 | Lock + int |
| Multi-field invariant | synchronized / ReentrantLock | sync.Mutex | threading.Lock |
| Lazy init (one-time) | static-holder idiom | sync.Once | module-level lock |
The most subtle bug in the JMM
A field that's protected by synchronized in some methods and accessed plain in others has no visibility guarantee at all. The JMM is all-or-nothing per field. Protect every access or none of them mean anything.
Primitives by language
- volatile
- synchronized
- final
- java.util.concurrent.atomic.*
- VarHandle (acquire/release/opaque)
Implementation
Without volatile, the JIT can hoist the read of ready outside the loop and read value before it was written. This isn't theoretical, it happens on real hardware under load.
1 class Broken {
2 boolean ready = false;
3 int value = 0;
4
5 void writer() {
6 value = 42;
7 ready = true; // may be reordered before value=42
8 }
9
10 void reader() {
11 while (!ready) { /* spin */ } // may spin forever or read stale
12 System.out.println(value); // value MAY still be 0
13 }
14 }Writing to a volatile field acts as a release: all prior writes in this thread are made visible. Reading from it acts as an acquire: the reader sees everything the writer published.
1 class Fixed {
2 int value = 0;
3 volatile boolean ready = false;
4
5 void writer() {
6 value = 42;
7 ready = true; // release: flushes prior writes
8 }
9
10 void reader() {
11 while (!ready) { /* spin */ } // acquire
12 System.out.println(value); // guaranteed 42
13 }
14 }volatile makes a single read or write visible, but ++ is read-modify-write, which is three operations. Two threads can interleave and lose updates. AtomicInteger (and LongAdder for high contention) fixes this with CAS.
1 // volatile is for visibility, NOT atomicity
2 volatile int counter = 0;
3 counter++; // NOT atomic, can lose updates
4
5 // Use AtomicInteger for compound ops
6 AtomicInteger atomic = new AtomicInteger(0);
7 atomic.incrementAndGet(); // CAS-based, atomic
8
9 // LongAdder shards internally, better under contention
10 LongAdder adder = new LongAdder();
11 adder.increment();DCL was famously broken pre-Java 5 because the JMM didn't constrain construction order. With volatile, the constructor's writes happen-before any thread reads a non-null reference. The static-holder idiom is even simpler, prefer it when possible.
1 class Singleton {
2 private static volatile Singleton instance;
3
4 public static Singleton get() {
5 Singleton local = instance;
6 if (local == null) {
7 synchronized (Singleton.class) {
8 local = instance;
9 if (local == null) {
10 local = new Singleton();
11 instance = local;
12 }
13 }
14 }
15 return local;
16 }
17 }Key points
- •Without happens-before, the JVM/CPU can reorder reads/writes for performance
- •volatile guarantees: visibility (latest value) + ordering (no reordering across the access)
- •synchronized guarantees: mutual exclusion + happens-before on lock release → next acquire
- •final fields: safely visible after constructor completes, no synchronization needed
- •Thread.start() happens-before any action in the started thread
- •Action in a thread happens-before another thread's successful join() return
Tradeoffs
| Option | Pros | Cons | When to use |
|---|---|---|---|
| volatile field |
|
| Status flags, write-once references, DCL guard |
| synchronized |
|
| Multi-step invariants on shared state |
| AtomicInteger / VarHandle |
|
| Hot counters, lock-free data structures, performance-critical paths |
Follow-up questions
▸What does volatile guarantee that a regular field doesn't?
▸Why doesn't volatile make ++ atomic?
▸What is happens-before?
▸Are final fields thread-safe without synchronization?
▸Does Python have a memory model like Java?
Gotchas
- !Reading a volatile array reference is volatile, but reading array[i] is NOT, use AtomicReferenceArray
- !synchronized on a String literal or autoboxed Boolean can deadlock unrelated code (shared instances)
- !Constructors that publish 'this' before completion break the final-field guarantee
- !long and double are NOT atomic on 32-bit JVMs without volatile
- !Python CPython's atomic-bytecode behavior is not portable to PyPy or free-threaded CPython
Common pitfalls
- Assuming x86's strong memory model removes the need for volatile, JIT reordering still applies
- Mixing synchronized and unsynchronized access to the same field, visibility is not guaranteed
- Relying on the GIL for cross-thread visibility in Python
Practice problems
Double-checked locking with volatile, or static holder idiom (preferred)
APIs worth memorising
- Java: java.util.concurrent.atomic.{AtomicInteger, AtomicReference, LongAdder}, VarHandle
- Python: threading.Lock, threading.Event, queue.Queue
- Go: sync/atomic.{Bool, Int64, Pointer}, channels for happens-before
Every concurrent Java library. ConcurrentHashMap uses volatile + CAS, ReentrantLock uses AbstractQueuedSynchronizer with volatile state, Spring's bean container relies on safe publication.