Memory Ordering: Acquire, Release, Relaxed, SeqCst
C++/Rust let the programmer specify how strict an atomic operation's memory ordering must be. SeqCst (sequentially consistent) is strongest and slowest. Acquire-release is the right default for publishing data. Relaxed gives no ordering, only atomicity (counters). Java and Go atomics are SeqCst by default; there's no choice to make.
The plain-English version
Modern CPUs and compilers reorder reads and writes to make code fast. Inside a single thread, the reordering is invisible because the compiler is careful to keep the program's observable behaviour the same. Across threads, the reorderings turn obvious-looking code into mysterious bugs: a thread reads a flag and then reads the data the flag was meant to publish, but the data write has not propagated yet.
Memory ordering is the set of dials a programmer can use to control how much reordering is allowed around an atomic operation. The cleanest mental model is a publish-subscribe contract:
- A "release" is a publish. It says "everything I wrote before this point is now ready for another thread to see."
- An "acquire" is a subscribe. It says "whatever the publisher wrote before their release, I can now see it."
When a release publishes and a matching acquire subscribes to the same atomic variable, all the writes the publisher did before the release become visible to the subscriber after the acquire. Without that release-acquire pair on the same variable, the cross-thread visibility is not guaranteed at all.
A picture of acquire-release
The producer fills in three pieces of data, then sets a ready flag with a release-store. The consumer waits until the ready flag becomes true with an acquire-load, then reads the data. The release-acquire pair on ready is what makes the data writes visible to the consumer.
The release-store says "everything above me, publish". The acquire-load says "everything below me, subscribe to whatever was published". Both halves are needed. A release with no matching acquire publishes into the void. An acquire with no matching release subscribes to nothing.
The four ordering levels
There are four ordering levels in C++ and Rust. Three of them matter in practice.
SeqCst (sequentially consistent). All SeqCst operations across all threads agree on a single global timeline. The strongest level and the easiest to reason about, because every thread sees the same total order of SeqCst events. The most expensive on weakly-ordered CPUs like ARM and POWER, where it inserts a fence on every operation. The right choice when correctness depends on every thread seeing the same order of events.
Acquire (on a load) plus Release (on a store). The publish-subscribe pair shown above. The everyday default for lock-free code. Cheaper than SeqCst, strong enough for almost every producer-consumer pattern. Acquire only applies to loads; release only applies to stores; the two cannot be swapped.
Relaxed. Atomic, but no ordering at all. The operation itself is indivisible (no torn reads or writes), but the compiler and CPU are free to reorder it against anything else. Use for counters and statistics where the only thing that matters is "do not lose increments". Never use for anything that publishes data.
Consume. A theoretical level for very specific pointer-following patterns. Every modern compiler quietly promotes it to acquire. Treat it as not existing.
Pick by the operation
The level to choose depends on what the atomic is doing.
| Operation | Right choice | Why |
|---|---|---|
| Counter (hits, retries, log lines) | Relaxed | Order does not matter; only the count does |
| Publishing data behind a flag or pointer | Release on the store, Acquire on the load | The flag is the handoff; the pair makes the data visible |
| Algorithms that need every thread to agree on a single global order (Dekker, Peterson) | SeqCst | Anything weaker can produce histories the algorithm was not designed for |
For about 90% of lock-free code, acquire-release is the right answer. SeqCst is the safe default when there is doubt. Relaxed is reserved for measured hot paths where the only operation is "increment a counter".
Why this is hard to get right
Memory-ordering bugs hide in plain sight. The code looks correct, the tests pass on the developer's laptop, and then production catches fire on a different CPU. The bug is not a wrong line. The bug is a missing dial setting on an atomic operation.
The reason this happens is that x86 has a strong memory model and ARM and POWER have weak ones. The same code behaves differently on each.
| x86 (strong ordering) | ARM and POWER (weak ordering) | |
|---|---|---|
| Relaxed | Often "works" by accident, hardware does not reorder much | Reorders aggressively; bugs surface fast |
| Acquire-release | Cheap, almost free on loads | Inserts a fence; cheap but not free |
| SeqCst | Loads are basically free, stores need a fence | Both sides need a fence; real per-operation cost |
Debugging happens on Intel laptops. Shipping happens onto ARM servers like AWS Graviton, Apple Silicon, or mobile devices. Same code, different behaviour.
A few defensive habits help:
- Default to SeqCst unless there is a measured reason to relax.
- For lock-free code, write down which release pairs with which acquire, and which writes the pairing protects. If that note cannot be written, the design is wrong.
- Get a second pair of eyes on the reasoning. One person almost never catches every reordering.
- Test on a weakly-ordered machine (an ARM box or a Mac with Apple Silicon) before shipping.
Why Java and Go do not expose these dials
Both languages decided that the bug surface from per-operation ordering choices was not worth the speed gain. Java's AtomicInteger, AtomicReference, and friends are always SeqCst. Go's sync/atomic operations have well-defined release-acquire semantics (since Go 1.19) and are not user-tunable.
For 99% of application code this is the right trade. The remaining 1% (HFT systems, custom lock-free data structures, kernel code) is where people reach for C++ or Rust to control the dials directly.
The takeaway
Think of release as "publish" and acquire as "subscribe". In lock-free code, find the release-acquire pair on each shared atomic, and ask which writes are being published from one side to the other. If that pair is not visible in the code, the code is broken.
Key points
- •SeqCst: every thread sees the same global order of all SeqCst operations. Strongest, slowest.
- •Acquire (on load) + Release (on store): publish-subscribe pattern. Writes before release are visible after acquire.
- •Relaxed: atomic, but no ordering. Use for counters where order doesn't matter.
- •Java atomics and Go's sync/atomic give SeqCst always. C++ and Rust allow per-operation choice.
- •Wrong relaxation = wrong code. Acquire-release is the default if unsure; relax only after measuring AND reasoning carefully.
Follow-up questions
▸Why doesn't Java let me choose memory ordering?
▸When should I prefer acquire-release over SeqCst?
▸Is relaxed ever wrong for counters?
▸What is the cost difference between SeqCst and relaxed?
Gotchas
- !Mixing relaxed with code that depends on ordering = subtle bugs
- !Acquire on store / release on load = compiler error; can't apply to wrong direction
- !Java atomics are SeqCst; they can't be relaxed
- !Memory ordering is ABOUT the compiler AND the CPU; both can reorder
- !Reasoning by 'it works on my machine' fails on weakly-ordered ARM