Core Mental ModelsTopic 15 of 19

ConceptAdvancedAsked Often

Atomic Operations & CAS

In one line

Atomic operations are indivisible CPU instructions that read-modify-write a single memory location safely, without locks. Compare-and-swap (CAS) is the foundation, "if this value is still X, replace it with Y", and it's how every lock-free data structure is built.

The two ideas in plain English

Atomic operation: a CPU instruction that reads, modifies, and writes a piece of memory in one indivisible step. Nobody else can see a half-finished update. Nobody can sneak in between the read and the write.

Compare-and-swap (CAS): the most useful atomic operation. It says: "If this memory still holds the value I expect, replace it with my new value. If something changed, don't write anything, and tell me what's actually there."

CAS is the foundation for coordinator-free updates. No locks, no waiting, no kernel calls. Just one hardware instruction.

A picture of CAS in action

Initial value at memory address X: 5

Time   Thread 1                          Thread 2
----   --------                          --------
t1     load X = 5
       compute new = 6
       (about to CAS)
                                         load X = 5
                                         compute new = 7
       CAS(X, expected=5, new=6)
       SUCCESS, X is now 6
                                         CAS(X, expected=5, new=7)
                                         FAIL: X is 6, not 5
                                         (the CAS returns the current value)
                                         
                                         load X = 6
                                         compute new = 8
                                         CAS(X, expected=6, new=8)
                                         SUCCESS, X is now 8

Thread 1 wins the first round. Thread 2's CAS sees that the memory changed (expected 5, found 6), so it doesn't overwrite. Thread 2 re-reads, recomputes, and tries again. Eventually one of them succeeds. No locks. No blocking.

The retry-loop pattern

This is the shape every lock-free counter, queue, and hashmap uses internally:

loop:
    current = atomic.load(X)
    new     = compute(current)
    if CAS(X, current, new) succeeded:
        return
    // someone else won; loop and try again

Want atomic increment? compute(current) = current + 1. Want atomic max-of-seen? compute(current) = max(current, my_sample). Want atomic "swap if even"? compute(current) = current % 2 == 0 ? new : current.

The loop turns one CAS instruction into any single-word atomic operation needed.

What CAS cannot do

One memory location at a time. CAS works on a single word (or in some hardware, a single double-word). If the invariant spans two variables ("transfer 50 from account A to account B" or "update both head and tail of a queue"), CAS alone is not enough.

What CAS handles cleanly:                What CAS cannot do alone:

  - Counters                                - Two-account money transfer
  - Single-word state (flag, pointer)       - Update head AND tail together
  - Lock-free stack head                    - Multi-field invariants
  - Single hash bucket                      - Anything that needs >1 location

For multi-location updates: use a lock, or restructure the data so the invariant fits in one word (often: one atomic pointer to an immutable record that contains both fields).

When CAS gets expensive: contention

A CAS retry loop spins. If 16 threads are all CAS-ing the same AtomicLong 100,000 times per second, most CAS attempts will fail. The CPUs go to 100% with little throughput.

Low contention (few writers):              High contention (many writers):

  T1: CAS succeeds, return                  T1: CAS fails, retry
  T2: CAS succeeds, return                  T2: CAS fails, retry
  T1: CAS succeeds, return                  T3: CAS succeeds
  ...                                       T1: CAS fails again, retry
                                            T2: CAS fails again, retry
                                            ...
  Throughput: 100M ops/sec                  Throughput: 1M ops/sec
                                            CPU: pegged at 100%

The fix is to remove the contention, not to make CAS faster:

Stripe the counter. Split it across N cells, one per CPU. LongAdder does this. Each thread bumps its own cell. The total is the sum of cells, computed on read. Trades read cost for write throughput.
Backoff. If the data structure can't be changed, sleep a small (random, exponentially growing) amount between failed retries to spread out the contention.
Fall back to a lock. Sometimes a quick lock is cheaper than 50 failed CAS attempts.

The ABA problem in one sentence

If memory holds A, changes to B, then back to A, the CAS sees "still A" and assumes nothing changed. But the world did change in between. Covered in detail in its own lesson; the short summary: garbage-collected languages mostly dodge it; C/C++ and code with object pools have to defend with tagged pointers, hazard pointers, or epoch reclamation.

Picking the right tool

   Single counter, mostly one writer
   ----------------------------------
   USE AtomicLong / atomic.Int64 / a single CAS

   Single counter, many concurrent writers
   ---------------------------------------
   USE LongAdder / sharded counter

   Multi-field invariant
   ---------------------
   USE a Mutex / synchronized block

   Read-mostly state, hot-swappable
   --------------------------------
   USE atomic pointer to an immutable record

The mental model

Atomics are mutual exclusion on a single word, at hardware speed. They're the cheapest way to coordinate when the data fits in one location. They're also the deepest building block: every lock in common use was implemented with atomic operations under the hood.

The skill is knowing when an atomic is enough (one word, one simple update) and when to step up to a lock (multi-field invariant, long critical section, complex condition).

Implementations

AtomicInteger, the building block

Every method on AtomicInteger is one atomic CPU instruction (or a CAS retry loop on top of one). No lock, no contention storm. The hardware guarantees the read-modify-write isn't interrupted.

1  AtomicInteger counter = new AtomicInteger(0);
2  
3  counter.incrementAndGet();   // atomic ++; uses CAS internally
4  counter.addAndGet(5);        // atomic += 5
5  int old = counter.getAndSet(0);   // atomic swap
6  
7  // The most powerful primitive, explicit CAS:
8  // "set to 100 only if still 5"
9  boolean success = counter.compareAndSet(5, 100);

CAS retry loop, implementing atomic max

AtomicInteger doesn't have an atomic max, but one can be built with CAS. Read current, compute new, CAS to install. If another thread raced in, CAS fails and the loop retries with the new current value. This is how every lock-free operation is built.

1  AtomicInteger highWaterMark = new AtomicInteger(0);
2  
3  void recordSample(int sample) {
4      int current;
5      do {
6          current = highWaterMark.get();
7          if (sample <= current) return;   // already higher; no-op
8      } while (!highWaterMark.compareAndSet(current, sample));
9  }

LongAdder, when AtomicLong contention bites

Under heavy contention, every CAS on the same AtomicLong fails repeatedly, threads spin retrying. LongAdder shards internally across N counters, one per thread, summing on read. Trades memory for throughput.

1  // Hot counter under heavy contention, AtomicLong becomes a bottleneck
2  AtomicLong requestCount = new AtomicLong();
3  requestCount.incrementAndGet();   // every thread CAS's the same word
4  
5  // LongAdder, internally sharded, much higher throughput
6  LongAdder adder = new LongAdder();
7  adder.increment();   // mostly hits a thread-local striped cell
8  long total = adder.sum();   // aggregates on read; eventually consistent

Key points

•An atomic operation is a single CPU instruction that can't be interrupted partway through
•CAS = compare-and-swap: 'if memory[addr] == expected, write new and return success; else fail and return current value'
•Lock-free programming uses CAS in retry loops, never blocks, but may spin under contention
•Atomics give single-variable safety; multi-variable invariants still need a lock
•ABA problem: a value goes A → B → A in between observations; CAS thinks nothing changed
•LongAdder/striped counters trade memory for throughput under high contention

Tradeoffs

Option	Pros	Cons	When to use
Atomic primitive (AtomicInteger / atomic.Int64)	Lock-free Faster than mutex for hot single-variable updates Strong memory ordering	Single variable only No compound invariants without CAS loops Spins under contention	Counters, flags, lock-free queues, stack pointers
CAS retry loop	Builds arbitrary atomic operations No blocking Works for max/min/conditional update	Spins on contention, wasted CPU ABA problem if values can recycle Hard to reason about	Custom atomic operations not in the stdlib
LongAdder / striped counter	Massively higher throughput under contention Eventually consistent reads	More memory (~1 cache line per thread) Not exact, sum reads scan all shards	Hot counters with many writer threads, infrequent reads, request counts, metrics
Mutex	Composable for multi-variable invariants Blocks instead of spins	Context-switch cost Slower for hot single-variable paths	Anything more than one variable; long critical sections

Follow-up questions

▸What's the ABA problem?

A thread reads value A, plans to CAS A → C. Before the CAS, another thread changes A → B → A. The CAS succeeds because the value is back to A, but the world changed in between. Common in lock-free linked lists where a node is removed and re-added. Solutions: tagged pointers (include a version counter), hazard pointers, or use a higher-level primitive.

▸When does a CAS retry loop become a busy-wait?

Under heavy contention, every CAS may fail and retry. CPU goes to 100%, no useful work happens. Mitigations: exponential backoff (sleep between retries), give up after N tries and use a lock fallback, or shard the contention (LongAdder). A CAS loop dominating the CPU profile is the symptom.

▸Are AtomicReference assignments atomic for compound operations?

AtomicReference.set() is atomic. compareAndSet is atomic. But '`if (ref.get() == X) { ref.set(Y) }`' is NOT atomic, between the get and set, another thread can race in. Use compareAndSet, or wrap in a lock.

▸Why is volatile not enough for counter++?

volatile makes the read and write each atomic and visible. But ++ is read-modify-write, three operations. Two threads can both read 5, both compute 6, both write 6, losing one update. AtomicInteger.incrementAndGet uses CAS to make the whole thing atomic.

▸How does CAS work at the CPU level?

x86 has the LOCK CMPXCHG instruction; ARM has LDREX/STREX or CAS (since ARMv8.1). The CPU asserts exclusive cache line ownership for the duration, performs the compare and conditional write, releases. Cost is roughly an L2 cache miss, fast but not free.

Gotchas

!ABA problem on linked-list CAS, value comes back to original after intermediate change
!Reading an atomic field non-atomically (`int x = atomic.Load(); other code; use x`), value can change between read and use
!AtomicInteger.incrementAndGet() is atomic; AtomicInteger.get() + 1 followed by set() is NOT
!Hot CAS loop on a single AtomicLong from many threads = CPU bottleneck, switch to LongAdder
!Java: AtomicInteger.lazySet vs set, lazySet doesn't establish happens-before, can lose visibility
!Go: atomic operations require *value pointer (& prefix); using value-by-value is silently wrong

Common pitfalls

Building lock-free data structures by hand without understanding ABA
Using AtomicReference for what should be a Lock, multi-variable invariants slip through
Spinning forever on a CAS loop without backoff under contention
Mixing atomic and non-atomic access to the same variable

Practice problems

Implement a thread-safe counter without locks

AtomicInteger.incrementAndGet, or CAS retry loop on AtomicReference, or LongAdder for contention

Implement a lock-free Treiber stack

Linked list with atomic head pointer; push/pop use CAS retry to install new head

Atomic max, record the largest value seen

CAS retry loop: read current, check if new is larger, CAS to install

APIs worth memorising

Java: AtomicInteger, AtomicLong, AtomicReference, AtomicMarkableReference, LongAdder, VarHandle
Go: sync/atomic.{Int32, Int64, Uint64, Bool, Pointer, Value}.{Load, Store, Swap, Add, CompareAndSwap}
Python: no atomics in stdlib, use threading.Lock or multiprocessing.Value(c_int)

Where this shows up

Every concurrent runtime. Java's ConcurrentHashMap uses CAS for resize and bin updates. Linux kernel uses atomic ops everywhere, RCU is built on them. Go's runtime uses CAS for goroutine scheduling. Most interview-level lock-free queues are CAS-based Treiber stacks or Michael-Scott queues.

Atomic Operations & CAS

In one line

The two ideas in plain English

CAS is the foundation for coordinator-free updates. No locks, no waiting, no kernel calls. Just one hardware instruction.

A picture of CAS in action

Initial value at memory address X: 5

Time   Thread 1                          Thread 2
----   --------                          --------
t1     load X = 5
       compute new = 6
       (about to CAS)
                                         load X = 5
                                         compute new = 7
       CAS(X, expected=5, new=6)
       SUCCESS, X is now 6
                                         CAS(X, expected=5, new=7)
                                         FAIL: X is 6, not 5
                                         (the CAS returns the current value)
                                         
                                         load X = 6
                                         compute new = 8
                                         CAS(X, expected=6, new=8)
                                         SUCCESS, X is now 8

The retry-loop pattern

This is the shape every lock-free counter, queue, and hashmap uses internally:

loop:
    current = atomic.load(X)
    new     = compute(current)
    if CAS(X, current, new) succeeded:
        return
    // someone else won; loop and try again

The loop turns one CAS instruction into any single-word atomic operation needed.

What CAS cannot do

What CAS handles cleanly:                What CAS cannot do alone:

  - Counters                                - Two-account money transfer
  - Single-word state (flag, pointer)       - Update head AND tail together
  - Lock-free stack head                    - Multi-field invariants
  - Single hash bucket                      - Anything that needs >1 location

For multi-location updates: use a lock, or restructure the data so the invariant fits in one word (often: one atomic pointer to an immutable record that contains both fields).

When CAS gets expensive: contention

A CAS retry loop spins. If 16 threads are all CAS-ing the same AtomicLong 100,000 times per second, most CAS attempts will fail. The CPUs go to 100% with little throughput.

Low contention (few writers):              High contention (many writers):

  T1: CAS succeeds, return                  T1: CAS fails, retry
  T2: CAS succeeds, return                  T2: CAS fails, retry
  T1: CAS succeeds, return                  T3: CAS succeeds
  ...                                       T1: CAS fails again, retry
                                            T2: CAS fails again, retry
                                            ...
  Throughput: 100M ops/sec                  Throughput: 1M ops/sec
                                            CPU: pegged at 100%

The fix is to remove the contention, not to make CAS faster:

Stripe the counter. Split it across N cells, one per CPU. LongAdder does this. Each thread bumps its own cell. The total is the sum of cells, computed on read. Trades read cost for write throughput.
Backoff. If the data structure can't be changed, sleep a small (random, exponentially growing) amount between failed retries to spread out the contention.
Fall back to a lock. Sometimes a quick lock is cheaper than 50 failed CAS attempts.

The ABA problem in one sentence

Picking the right tool

   Single counter, mostly one writer
   ----------------------------------
   USE AtomicLong / atomic.Int64 / a single CAS

   Single counter, many concurrent writers
   ---------------------------------------
   USE LongAdder / sharded counter

   Multi-field invariant
   ---------------------
   USE a Mutex / synchronized block

   Read-mostly state, hot-swappable
   --------------------------------
   USE atomic pointer to an immutable record

The mental model

The skill is knowing when an atomic is enough (one word, one simple update) and when to step up to a lock (multi-field invariant, long critical section, complex condition).

Implementations

AtomicInteger, the building block

Every method on AtomicInteger is one atomic CPU instruction (or a CAS retry loop on top of one). No lock, no contention storm. The hardware guarantees the read-modify-write isn't interrupted.

1  AtomicInteger counter = new AtomicInteger(0);
2  
3  counter.incrementAndGet();   // atomic ++; uses CAS internally
4  counter.addAndGet(5);        // atomic += 5
5  int old = counter.getAndSet(0);   // atomic swap
6  
7  // The most powerful primitive, explicit CAS:
8  // "set to 100 only if still 5"
9  boolean success = counter.compareAndSet(5, 100);

CAS retry loop, implementing atomic max

1  AtomicInteger highWaterMark = new AtomicInteger(0);
2  
3  void recordSample(int sample) {
4      int current;
5      do {
6          current = highWaterMark.get();
7          if (sample <= current) return;   // already higher; no-op
8      } while (!highWaterMark.compareAndSet(current, sample));
9  }

LongAdder, when AtomicLong contention bites

1  // Hot counter under heavy contention, AtomicLong becomes a bottleneck
2  AtomicLong requestCount = new AtomicLong();
3  requestCount.incrementAndGet();   // every thread CAS's the same word
4  
5  // LongAdder, internally sharded, much higher throughput
6  LongAdder adder = new LongAdder();
7  adder.increment();   // mostly hits a thread-local striped cell
8  long total = adder.sum();   // aggregates on read; eventually consistent

Key points

•An atomic operation is a single CPU instruction that can't be interrupted partway through
•CAS = compare-and-swap: 'if memory[addr] == expected, write new and return success; else fail and return current value'
•Lock-free programming uses CAS in retry loops, never blocks, but may spin under contention
•Atomics give single-variable safety; multi-variable invariants still need a lock
•ABA problem: a value goes A → B → A in between observations; CAS thinks nothing changed
•LongAdder/striped counters trade memory for throughput under high contention

Tradeoffs

Option	Pros	Cons	When to use
Atomic primitive (AtomicInteger / atomic.Int64)	Lock-free Faster than mutex for hot single-variable updates Strong memory ordering	Single variable only No compound invariants without CAS loops Spins under contention	Counters, flags, lock-free queues, stack pointers
CAS retry loop	Builds arbitrary atomic operations No blocking Works for max/min/conditional update	Spins on contention, wasted CPU ABA problem if values can recycle Hard to reason about	Custom atomic operations not in the stdlib
LongAdder / striped counter	Massively higher throughput under contention Eventually consistent reads	More memory (~1 cache line per thread) Not exact, sum reads scan all shards	Hot counters with many writer threads, infrequent reads, request counts, metrics
Mutex	Composable for multi-variable invariants Blocks instead of spins	Context-switch cost Slower for hot single-variable paths	Anything more than one variable; long critical sections

Follow-up questions

▸What's the ABA problem?

▸When does a CAS retry loop become a busy-wait?

▸Are AtomicReference assignments atomic for compound operations?

▸Why is volatile not enough for counter++?

▸How does CAS work at the CPU level?

Gotchas

!ABA problem on linked-list CAS, value comes back to original after intermediate change
!Reading an atomic field non-atomically (`int x = atomic.Load(); other code; use x`), value can change between read and use
!AtomicInteger.incrementAndGet() is atomic; AtomicInteger.get() + 1 followed by set() is NOT
!Hot CAS loop on a single AtomicLong from many threads = CPU bottleneck, switch to LongAdder
!Java: AtomicInteger.lazySet vs set, lazySet doesn't establish happens-before, can lose visibility
!Go: atomic operations require *value pointer (& prefix); using value-by-value is silently wrong

Common pitfalls

Building lock-free data structures by hand without understanding ABA
Using AtomicReference for what should be a Lock, multi-variable invariants slip through
Spinning forever on a CAS loop without backoff under contention
Mixing atomic and non-atomic access to the same variable

Practice problems

Implement a thread-safe counter without locks

AtomicInteger.incrementAndGet, or CAS retry loop on AtomicReference, or LongAdder for contention

Implement a lock-free Treiber stack

Linked list with atomic head pointer; push/pop use CAS retry to install new head

Atomic max, record the largest value seen

CAS retry loop: read current, check if new is larger, CAS to install

APIs worth memorising

Java: AtomicInteger, AtomicLong, AtomicReference, AtomicMarkableReference, LongAdder, VarHandle
Go: sync/atomic.{Int32, Int64, Uint64, Bool, Pointer, Value}.{Load, Store, Swap, Add, CompareAndSwap}
Python: no atomics in stdlib, use threading.Lock or multiprocessing.Value(c_int)

Where this shows up

Atomic Operations & CAS

The two ideas in plain English

A picture of CAS in action

The retry-loop pattern

What CAS cannot do

When CAS gets expensive: contention

The ABA problem in one sentence

Picking the right tool

The mental model

Implementations

Key points

Tradeoffs

Follow-up questions

Gotchas

Common pitfalls

Practice problems

APIs worth memorising

Related reading

Atomic Operations & CAS

The two ideas in plain English

A picture of CAS in action

The retry-loop pattern

What CAS cannot do

When CAS gets expensive: contention

The ABA problem in one sentence

Picking the right tool

The mental model

Implementations

Key points

Tradeoffs

Follow-up questions

Gotchas

Common pitfalls

Practice problems

APIs worth memorising

Related reading