Memory Ordering: Acquire, Release, Relaxed, SeqCst

The plain-English version

Modern CPUs and compilers reorder reads and writes to make code fast. Inside a single thread, the reordering is invisible because the compiler is careful to keep the program's observable behaviour the same. Across threads, the reorderings turn obvious-looking code into mysterious bugs: a thread reads a flag and then reads the data the flag was meant to publish, but the data write has not propagated yet.

Memory ordering is the set of dials a programmer can use to control how much reordering is allowed around an atomic operation. The cleanest mental model is a publish-subscribe contract:

A "release" is a publish. It says "everything I wrote before this point is now ready for another thread to see."
An "acquire" is a subscribe. It says "whatever the publisher wrote before their release, I can now see it."

When a release publishes and a matching acquire subscribes to the same atomic variable, all the writes the publisher did before the release become visible to the subscriber after the acquire. Without that release-acquire pair on the same variable, the cross-thread visibility is not guaranteed at all.

A picture of acquire-release

The producer fills in three pieces of data, then sets a ready flag with a release-store. The consumer waits until the ready flag becomes true with an acquire-load, then reads the data. The release-acquire pair on ready is what makes the data writes visible to the consumer.

The release-store says "everything above me, publish". The acquire-load says "everything below me, subscribe to whatever was published". Both halves are needed. A release with no matching acquire publishes into the void. An acquire with no matching release subscribes to nothing.

The four ordering levels

There are four ordering levels in C++ and Rust. Three of them matter in practice.

SeqCst (sequentially consistent). All SeqCst operations across all threads agree on a single global timeline. The strongest level and the easiest to reason about, because every thread sees the same total order of SeqCst events. The most expensive on weakly-ordered CPUs like ARM and POWER, where it inserts a fence on every operation. The right choice when correctness depends on every thread seeing the same order of events.

Acquire (on a load) plus Release (on a store). The publish-subscribe pair shown above. The everyday default for lock-free code. Cheaper than SeqCst, strong enough for almost every producer-consumer pattern. Acquire only applies to loads; release only applies to stores; the two cannot be swapped.

Relaxed. Atomic, but no ordering at all. The operation itself is indivisible (no torn reads or writes), but the compiler and CPU are free to reorder it against anything else. Use for counters and statistics where the only thing that matters is "do not lose increments". Never use for anything that publishes data.

Consume. A theoretical level for very specific pointer-following patterns. Every modern compiler quietly promotes it to acquire. Treat it as not existing.

Pick by the operation

The level to choose depends on what the atomic is doing.

Operation	Right choice	Why
Counter (hits, retries, log lines)	Relaxed	Order does not matter; only the count does
Publishing data behind a flag or pointer	Release on the store, Acquire on the load	The flag is the handoff; the pair makes the data visible
Algorithms that need every thread to agree on a single global order (Dekker, Peterson)	SeqCst	Anything weaker can produce histories the algorithm was not designed for

For about 90% of lock-free code, acquire-release is the right answer. SeqCst is the safe default when there is doubt. Relaxed is reserved for measured hot paths where the only operation is "increment a counter".

Why this is hard to get right

Memory-ordering bugs hide in plain sight. The code looks correct, the tests pass on the developer's laptop, and then production catches fire on a different CPU. The bug is not a wrong line. The bug is a missing dial setting on an atomic operation.

The reason this happens is that x86 has a strong memory model and ARM and POWER have weak ones. The same code behaves differently on each.

	x86 (strong ordering)	ARM and POWER (weak ordering)
Relaxed	Often "works" by accident, hardware does not reorder much	Reorders aggressively; bugs surface fast
Acquire-release	Cheap, almost free on loads	Inserts a fence; cheap but not free
SeqCst	Loads are basically free, stores need a fence	Both sides need a fence; real per-operation cost

Debugging happens on Intel laptops. Shipping happens onto ARM servers like AWS Graviton, Apple Silicon, or mobile devices. Same code, different behaviour.

A few defensive habits help:

Default to SeqCst unless there is a measured reason to relax.
For lock-free code, write down which release pairs with which acquire, and which writes the pairing protects. If that note cannot be written, the design is wrong.
Get a second pair of eyes on the reasoning. One person almost never catches every reordering.
Test on a weakly-ordered machine (an ARM box or a Mac with Apple Silicon) before shipping.

Why Java and Go do not expose these dials

Both languages decided that the bug surface from per-operation ordering choices was not worth the speed gain. Java's AtomicInteger, AtomicReference, and friends are always SeqCst. Go's sync/atomic operations have well-defined release-acquire semantics (since Go 1.19) and are not user-tunable.

For 99% of application code this is the right trade. The remaining 1% (HFT systems, custom lock-free data structures, kernel code) is where people reach for C++ or Rust to control the dials directly.

The takeaway

Think of release as "publish" and acquire as "subscribe". In lock-free code, find the release-acquire pair on each shared atomic, and ask which writes are being published from one side to the other. If that pair is not visible in the code, the code is broken.

The plain-English version

Memory ordering is the set of dials a programmer can use to control how much reordering is allowed around an atomic operation. The cleanest mental model is a publish-subscribe contract:

A "release" is a publish. It says "everything I wrote before this point is now ready for another thread to see."
An "acquire" is a subscribe. It says "whatever the publisher wrote before their release, I can now see it."

A picture of acquire-release

The four ordering levels

There are four ordering levels in C++ and Rust. Three of them matter in practice.

Consume. A theoretical level for very specific pointer-following patterns. Every modern compiler quietly promotes it to acquire. Treat it as not existing.

Pick by the operation

The level to choose depends on what the atomic is doing.

Operation	Right choice	Why
Counter (hits, retries, log lines)	Relaxed	Order does not matter; only the count does
Publishing data behind a flag or pointer	Release on the store, Acquire on the load	The flag is the handoff; the pair makes the data visible
Algorithms that need every thread to agree on a single global order (Dekker, Peterson)	SeqCst	Anything weaker can produce histories the algorithm was not designed for

Why this is hard to get right

The reason this happens is that x86 has a strong memory model and ARM and POWER have weak ones. The same code behaves differently on each.

	x86 (strong ordering)	ARM and POWER (weak ordering)
Relaxed	Often "works" by accident, hardware does not reorder much	Reorders aggressively; bugs surface fast
Acquire-release	Cheap, almost free on loads	Inserts a fence; cheap but not free
SeqCst	Loads are basically free, stores need a fence	Both sides need a fence; real per-operation cost

Debugging happens on Intel laptops. Shipping happens onto ARM servers like AWS Graviton, Apple Silicon, or mobile devices. Same code, different behaviour.

A few defensive habits help:

Default to SeqCst unless there is a measured reason to relax.
For lock-free code, write down which release pairs with which acquire, and which writes the pairing protects. If that note cannot be written, the design is wrong.
Get a second pair of eyes on the reasoning. One person almost never catches every reordering.
Test on a weakly-ordered machine (an ARM box or a Mac with Apple Silicon) before shipping.

Why Java and Go do not expose these dials

For 99% of application code this is the right trade. The remaining 1% (HFT systems, custom lock-free data structures, kernel code) is where people reach for C++ or Rust to control the dials directly.

The plain-English version

A picture of acquire-release

The four ordering levels

Pick by the operation

Why this is hard to get right

Why Java and Go do not expose these dials

The takeaway

Key points

Follow-up questions

Gotchas

Related reading

Memory Ordering: Acquire, Release, Relaxed, SeqCst

The plain-English version

A picture of acquire-release

The four ordering levels

Pick by the operation

Why this is hard to get right

Why Java and Go do not expose these dials

The takeaway

Key points

Follow-up questions

Gotchas

Related reading