Treiber Stack: The Canonical Lock-Free Stack
Lock-free stack using CAS on the head pointer. Push: build new node pointing at current head, CAS to install. Pop: read head, CAS-replace with head.next. The textbook lock-free data structure; suffers from the ABA problem and from contention on the head.
What it is
The Treiber stack (R. Kent Treiber, 1986) is the original lock-free data structure ever published. It is a stack with two operations, push and pop, both built on a single atomic pointer (the head of a linked list) plus a CAS instruction.
It is the textbook example for lock-free programming. Simple enough to understand in one page, and rich enough to demonstrate every classic hazard at once: the ABA problem, memory reclamation, and cache-line contention.
A picture of the stack
The data structure is an ordinary singly linked list. The trick is that the head is an atomic pointer instead of a plain one. The CAS instruction operates on that single pointer; nothing else needs synchronization.
The atomic head is the only memory location that CAS touches. Every push and every pop is one CAS away from being done.
Push, step by step
A push has three steps:
- Allocate a new node holding the value to push.
- Set the new node's
nextfield to the current head. - CAS the head from the old head to the new node. If CAS fails, somebody else changed the head in the meantime; reload and retry from step 2.
A push of 99 onto head → A → B → C looks like this:
The CAS is the moment the push happens. Before CAS, the new node is invisible to anyone else, since no thread has a way to reach it yet. After CAS succeeds, every other thread that loads the head sees the new top of the stack.
Pop, step by step
A pop has four steps:
- Read the head into a local variable
oldHead. - If
oldHeadis null, the stack is empty; return. - CAS the head from
oldHeadtooldHead.next. If CAS fails, reload and retry from step 1. - Return
oldHead.value.
A pop from head → A → B → C looks like this:
The CAS is the linearization point of the pop. The pop appears to take effect at the exact instant the CAS succeeds. No lock, no blocking, no waiting on another thread.
Where things go wrong: the ABA problem
This is the textbook hazard for pointer-based lock-free code without a garbage collector.
Imagine Thread 1 starts a pop. It reads head = X, reads X.next, and is about to CAS. Before the CAS happens, Thread 2 races ahead. Thread 2 pops X (so X is freed), pops the node after X, allocates a new node, and the allocator hands back X's old address for that new node. Thread 2 pushes the new node. The head pointer is once again the same bit pattern as the X that Thread 1 read at the start, even though it is a different node living at the recycled address.
Thread 1 resumes and runs its CAS. The pointer comparison succeeds, because the head's bits match what Thread 1 saved. The CAS installs X.next (which Thread 1 read at the start) as the new head. But that pointer now refers to a node that Thread 2 already freed. The stack is corrupted, and any thread that touches the head next reads from reclaimed memory.
In garbage-collected languages like Java and Go, this does not happen in idiomatic code. As long as Thread 1 still holds oldHead in a local variable, the GC keeps the original X alive, and the allocator cannot put a different node at X's address. The CAS then sees a different pointer when Thread 2 has changed the head, fails correctly, and the loop retries.
In C and C++, the protection has to be added on purpose. The standard tools are tagged pointers (pack a version counter into the pointer's unused bits), hazard pointers (each thread publishes the pointers it is about to use), and epoch-based reclamation (delay every free until all threads have moved past the freeing epoch). Each adds real complexity. There is a separate lesson on the ABA problem that goes into all three.
Memory reclamation
After a pop succeeds, can the node be freed immediately?
In a GC language, yes. The GC will free it whenever no thread is holding a reference. The implementation does not need to think about this.
In C or C++, the answer is no. Another thread could still be holding the popped node's address from before the CAS, with an oldHead local variable that has not been touched yet. Freeing the node now risks a use-after-free in that other thread.
Three standard solutions, in increasing order of complexity:
- Hazard pointers. Each thread publishes the pointers it is about to dereference. Before freeing a node, the freer scans every thread's hazard list. If anyone has the node listed, the free is deferred.
- Epoch-based reclamation. A global epoch counter. Each thread reads the current epoch when it enters a critical section. Freed nodes wait in a quarantine bucket until every thread has advanced past the epoch in which they were freed. Used by Linux's RCU and Rust's crossbeam.
- Reference counting. Each node has an atomic counter. Increment on read, decrement on done, free at zero. Simple to reason about, slow under contention because every reader writes to the same counter.
Each one is a research topic on its own. For production C++ code, the right answer is to use a library (folly, libcds) instead of writing a custom one.
When to use a Treiber stack
Honestly, rarely.
For most "give me a stack" cases, a mutex-protected std::stack or equivalent is simpler and often faster than the lock-free version. Cache-line contention on the single head pointer makes the Treiber stack scale poorly when many threads push or pop concurrently. For high-throughput cases, an object pool, a sync.Pool-style reuse cache, or a vetted bounded queue from a library is almost always a better choice.
The Treiber stack is valuable as a teaching tool, not a building block. It illustrates the shape every lock-free algorithm shares (a CAS retry loop), the canonical hazard of pointer-based lock-free code (ABA), and the hardest part of doing this without a GC (memory reclamation).
The three things worth carrying forward from studying it:
- The CAS retry loop is the building block of nearly every lock-free algorithm.
- ABA is the canonical hazard of pointer-based lock-free code without a garbage collector.
- Memory reclamation is usually harder than the algorithm itself. In GC languages, it is free. In C and C++, it is most of the implementation work.
Understanding the Treiber stack is what unlocks the literature on Michael-Scott queues, hazard pointers, RCU, and lock-free hash maps. All of them are variations on the same three themes.
Implementations
Java's GC handles memory reclamation, so ABA is mostly avoided (a node address can't be reused until no thread holds a reference). The CAS on head is the only synchronisation needed.
1 public class TreiberStack<T> {
2 private final AtomicReference<Node<T>> head = new AtomicReference<>();
3
4 private record Node<T>(T value, Node<T> next) {}
5
6 public void push(T value) {
7 Node<T> oldHead, newHead;
8 do {
9 oldHead = head.get();
10 newHead = new Node<>(value, oldHead);
11 } while (!head.compareAndSet(oldHead, newHead));
12 }
13
14 public T pop() {
15 Node<T> oldHead, newHead;
16 do {
17 oldHead = head.get();
18 if (oldHead == null) return null;
19 newHead = oldHead.next();
20 } while (!head.compareAndSet(oldHead, newHead));
21 return oldHead.value();
22 }
23 }Key points
- •Push: allocate node, set node.next = head, CAS(&head, oldHead, node).
- •Pop: load head, then CAS(&head, oldHead, oldHead.next). Retry on CAS failure.
- •Vulnerable to ABA: between read and CAS, head can be popped, freed, and re-pushed at the same address.
- •Memory reclamation is the hard part: when is it safe to free a popped node? Other threads might still be reading it.
- •Single-cache-line contention on head: not great for many concurrent threads. Lock-free MPSC queues outperform for high contention.
Follow-up questions
▸Why is ABA a problem here?
▸Why isn't ABA a problem in Java/Go?
▸Is Treiber stack the best lock-free stack?
Gotchas
- !ABA: in C/C++ without GC or hazard pointers, popped nodes can be reused, breaking CAS
- !Memory reclamation: when is it safe to free a popped node?
- !Cache-line contention on head: scales poorly with many writers
- !Push allocates a node; allocator contention can dominate the lock-free win
- !Lock-free does NOT mean wait-free; threads can starve indefinitely under bad scheduling