Treiber Stack: The Canonical Lock-Free Stack

What it is

The Treiber stack (R. Kent Treiber, 1986) is the original lock-free data structure ever published. It is a stack with two operations, push and pop, both built on a single atomic pointer (the head of a linked list) plus a CAS instruction.

It is the textbook example for lock-free programming. Simple enough to understand in one page, and rich enough to demonstrate every classic hazard at once: the ABA problem, memory reclamation, and cache-line contention.

A picture of the stack

The data structure is an ordinary singly linked list. The trick is that the head is an atomic pointer instead of a plain one. The CAS instruction operates on that single pointer; nothing else needs synchronization.

The atomic head is the only memory location that CAS touches. Every push and every pop is one CAS away from being done.

Push, step by step

A push has three steps:

Allocate a new node holding the value to push.
Set the new node's next field to the current head.
CAS the head from the old head to the new node. If CAS fails, somebody else changed the head in the meantime; reload and retry from step 2.

A push of 99 onto head → A → B → C looks like this:

The CAS is the moment the push happens. Before CAS, the new node is invisible to anyone else, since no thread has a way to reach it yet. After CAS succeeds, every other thread that loads the head sees the new top of the stack.

Pop, step by step

A pop has four steps:

Read the head into a local variable oldHead.
If oldHead is null, the stack is empty; return.
CAS the head from oldHead to oldHead.next. If CAS fails, reload and retry from step 1.
Return oldHead.value.

A pop from head → A → B → C looks like this:

The CAS is the linearization point of the pop. The pop appears to take effect at the exact instant the CAS succeeds. No lock, no blocking, no waiting on another thread.

Where things go wrong: the ABA problem

This is the textbook hazard for pointer-based lock-free code without a garbage collector.

Imagine Thread 1 starts a pop. It reads head = X, reads X.next, and is about to CAS. Before the CAS happens, Thread 2 races ahead. Thread 2 pops X (so X is freed), pops the node after X, allocates a new node, and the allocator hands back X's old address for that new node. Thread 2 pushes the new node. The head pointer is once again the same bit pattern as the X that Thread 1 read at the start, even though it is a different node living at the recycled address.

Thread 1 resumes and runs its CAS. The pointer comparison succeeds, because the head's bits match what Thread 1 saved. The CAS installs X.next (which Thread 1 read at the start) as the new head. But that pointer now refers to a node that Thread 2 already freed. The stack is corrupted, and any thread that touches the head next reads from reclaimed memory.

In garbage-collected languages like Java and Go, this does not happen in idiomatic code. As long as Thread 1 still holds oldHead in a local variable, the GC keeps the original X alive, and the allocator cannot put a different node at X's address. The CAS then sees a different pointer when Thread 2 has changed the head, fails correctly, and the loop retries.

In C and C++, the protection has to be added on purpose. The standard tools are tagged pointers (pack a version counter into the pointer's unused bits), hazard pointers (each thread publishes the pointers it is about to use), and epoch-based reclamation (delay every free until all threads have moved past the freeing epoch). Each adds real complexity. There is a separate lesson on the ABA problem that goes into all three.

Memory reclamation

After a pop succeeds, can the node be freed immediately?

In a GC language, yes. The GC will free it whenever no thread is holding a reference. The implementation does not need to think about this.

In C or C++, the answer is no. Another thread could still be holding the popped node's address from before the CAS, with an oldHead local variable that has not been touched yet. Freeing the node now risks a use-after-free in that other thread.

Three standard solutions, in increasing order of complexity:

Hazard pointers. Each thread publishes the pointers it is about to dereference. Before freeing a node, the freer scans every thread's hazard list. If anyone has the node listed, the free is deferred.
Epoch-based reclamation. A global epoch counter. Each thread reads the current epoch when it enters a critical section. Freed nodes wait in a quarantine bucket until every thread has advanced past the epoch in which they were freed. Used by Linux's RCU and Rust's crossbeam.
Reference counting. Each node has an atomic counter. Increment on read, decrement on done, free at zero. Simple to reason about, slow under contention because every reader writes to the same counter.

Each one is a research topic on its own. For production C++ code, the right answer is to use a library (folly, libcds) instead of writing a custom one.

When to use a Treiber stack

Honestly, rarely.

For most "give me a stack" cases, a mutex-protected std::stack or equivalent is simpler and often faster than the lock-free version. Cache-line contention on the single head pointer makes the Treiber stack scale poorly when many threads push or pop concurrently. For high-throughput cases, an object pool, a sync.Pool-style reuse cache, or a vetted bounded queue from a library is almost always a better choice.

The Treiber stack is valuable as a teaching tool, not a building block. It illustrates the shape every lock-free algorithm shares (a CAS retry loop), the canonical hazard of pointer-based lock-free code (ABA), and the hardest part of doing this without a GC (memory reclamation).

The three things worth carrying forward from studying it:

The CAS retry loop is the building block of nearly every lock-free algorithm.
ABA is the canonical hazard of pointer-based lock-free code without a garbage collector.
Memory reclamation is usually harder than the algorithm itself. In GC languages, it is free. In C and C++, it is most of the implementation work.

Understanding the Treiber stack is what unlocks the literature on Michael-Scott queues, hazard pointers, RCU, and lock-free hash maps. All of them are variations on the same three themes.

What it is

A picture of the stack

The atomic head is the only memory location that CAS touches. Every push and every pop is one CAS away from being done.

Push, step by step

A push has three steps:

Allocate a new node holding the value to push.
Set the new node's next field to the current head.
CAS the head from the old head to the new node. If CAS fails, somebody else changed the head in the meantime; reload and retry from step 2.

A push of 99 onto head → A → B → C looks like this:

Pop, step by step

A pop has four steps:

Read the head into a local variable oldHead.
If oldHead is null, the stack is empty; return.
CAS the head from oldHead to oldHead.next. If CAS fails, reload and retry from step 1.
Return oldHead.value.

A pop from head → A → B → C looks like this:

The CAS is the linearization point of the pop. The pop appears to take effect at the exact instant the CAS succeeds. No lock, no blocking, no waiting on another thread.

Where things go wrong: the ABA problem

This is the textbook hazard for pointer-based lock-free code without a garbage collector.

Memory reclamation

After a pop succeeds, can the node be freed immediately?

In a GC language, yes. The GC will free it whenever no thread is holding a reference. The implementation does not need to think about this.

Three standard solutions, in increasing order of complexity:

Hazard pointers. Each thread publishes the pointers it is about to dereference. Before freeing a node, the freer scans every thread's hazard list. If anyone has the node listed, the free is deferred.
Epoch-based reclamation. A global epoch counter. Each thread reads the current epoch when it enters a critical section. Freed nodes wait in a quarantine bucket until every thread has advanced past the epoch in which they were freed. Used by Linux's RCU and Rust's crossbeam.
Reference counting. Each node has an atomic counter. Increment on read, decrement on done, free at zero. Simple to reason about, slow under contention because every reader writes to the same counter.

Each one is a research topic on its own. For production C++ code, the right answer is to use a library (folly, libcds) instead of writing a custom one.

When to use a Treiber stack

Honestly, rarely.

The three things worth carrying forward from studying it:

The CAS retry loop is the building block of nearly every lock-free algorithm.
ABA is the canonical hazard of pointer-based lock-free code without a garbage collector.
Memory reclamation is usually harder than the algorithm itself. In GC languages, it is free. In C and C++, it is most of the implementation work.

Understanding the Treiber stack is what unlocks the literature on Michael-Scott queues, hazard pointers, RCU, and lock-free hash maps. All of them are variations on the same three themes.

What it is

A picture of the stack

Push, step by step

Pop, step by step

Where things go wrong: the ABA problem

Memory reclamation

When to use a Treiber stack

Implementations

Key points

Follow-up questions

Gotchas

Related reading

Treiber Stack: The Canonical Lock-Free Stack

What it is

A picture of the stack

Push, step by step

Pop, step by step

Where things go wrong: the ABA problem

Memory reclamation

When to use a Treiber stack

Implementations

Key points

Follow-up questions

Gotchas

Related reading