Lock-Free ProgrammingTopic 10 of 14

ProblemExpertSometimes

LMAX Disruptor

In one line

A pre-allocated ring buffer plus per-consumer sequence cursors. Producers claim slots with CAS; consumers wait until the producer cursor advances past their position. Slots are reused (no per-message allocation), data lives in cache lines under explicit control (no false sharing), and the wait strategy is pluggable (busy spin, yield, block). Built by LMAX for trading, hits 25M+ messages per second per pipeline.

Diagram

The Disruptor in plain English

Picture a circular conveyor belt with 1024 slots. The producer drops items into slots, walking around the ring. The consumer picks items out of slots, walking around the same ring, a little behind the producer. Both walkers keep track of which slot they are at right now using a number called a cursor. That is the entire data structure: a ring of slots and two cursors. There are no locks anywhere.

The "done" slots are behind the consumer, already taken. The "ready" slots are between the two cursors, waiting to be consumed. The "empty" slots are ahead of the producer, free for the next round. The gap between the two cursors is the backlog of items waiting to be consumed. The maximum gap that can ever exist is the ring size itself; if the producer pulls one full lap ahead, it would overwrite a slot the consumer has not read yet, so it must wait.

The producer publishes a new item by writing into the slot at producerCursor + 1 and then bumping producerCursor. The consumer reads from consumerCursor + 1, then bumps its own cursor. No locks, no per-message allocation (the slots are pre-allocated objects that get reused forever), no cache-line bouncing (each cursor sits alone in its own 64-byte cache line so writes from one core do not invalidate the other core's cached copy).

Why it is fast

The classic point of comparison is LinkedBlockingQueue or any other standard locked queue from java.util.concurrent. The Disruptor wins on three independent things at once.

Per-message cost	LinkedBlockingQueue	LMAX Disruptor
Allocation	One new node per put	Zero, slots are pre-allocated and reused
Lock	Acquire on every put and take	None, atomic cursor read/write
Memory layout	Scattered nodes on the heap	One contiguous 64KB ring (1024 × 64 bytes)
Garbage on hot path	One node every message, GC churn	None
Cross-core coordination	Lock contention, parking, kernel transitions	Cache-friendly atomic cursor advances

Each row helps a little; the three together compound. The result on hot paths is around ten to a hundred times higher throughput than a locked queue. The LMAX trading exchange itself runs around six million trades per second through this design. Log4j2's async logger uses the same library and pushes around eighteen million events per second.

When this is the right tool

The Disruptor is a specialised tool. It pays off when all of these are true:

The workload moves more than around a million messages per second through a single pipeline.
Tail latency matters, that is, the goal is a tight p99 or p99.9, not just an acceptable average.
The work per message is small enough that queue overhead is a real fraction of total time.

For a typical web service, a CRUD application, or a request handler, none of those are true. The right choice in those cases is a LinkedBlockingQueue, an ArrayBlockingQueue, or a Java 21 virtual-thread-per-request model. They are simpler, well-understood, and fast enough. The Disruptor's complexity is only worth it under sustained, high-rate, low-latency workloads.

How the cursors stay in sync without locks

The whole design is held together by two invariants. As long as both invariants are kept, no lock is needed.

Invariant	What it prevents
`consumerCursor <= producerCursor`	The consumer reading a slot the producer has not written
`producerCursor - consumerCursor <= SIZE`	The producer overwriting a slot the consumer has not read

The producer's protocol for publishing slot N is three steps:

Wait until N - consumerCursor <= SIZE. There is room in the ring.
Write the data into slots[N & MASK]. The mask trick (where MASK = SIZE - 1) replaces a modulo with a single AND because SIZE is a power of two.
Atomically set producerCursor = N with release semantics. This is the publish.

The consumer's protocol for consuming slot N is also three steps:

Wait until N <= producerCursor. The producer has written the slot.
Read slots[N & MASK] with acquire semantics.
Atomically set consumerCursor = N. The slot is now free for the producer to reuse on its next lap.

The atomic write to producerCursor is the publish. Any consumer that observes the new cursor value is guaranteed to also see the slot data, because the slot write happens-before the cursor write in program order, and the release-acquire pair on the cursor makes that ordering visible across cores. No locks, no allocations, just two atomic numbers and a chunk of contiguous memory.

Wait strategies

How a thread waits when it cannot make progress (the consumer waiting for the producer, or the producer waiting for slot reclamation) is configurable. The four common strategies trade CPU usage against latency, in that order from lowest latency to lowest CPU:

BusySpinWaitStrategy. A tight while (!ready) Thread.onSpinWait(); loop. Microsecond-level latency, eats 100% of a core while waiting. Only sensible on a dedicated core in a latency-critical system.
YieldingWaitStrategy. Spin a few times, then call Thread.yield(). Low latency, less CPU greedy than busy spin. The sensible default for most low-latency systems.
SleepingWaitStrategy. Spin briefly, then LockSupport.parkNanos(1). Higher latency, near-zero CPU when idle. Good for cold or low-rate consumers.
BlockingWaitStrategy. Park on a Condition until the producer signals. Highest latency, lowest CPU. Use for consumers that are expected to be cold most of the time, mixed with hot ones in the same pipeline.

The mistake people make is picking BusySpinWaitStrategy because it sounds fast, then watching the rest of the services on the box starve for CPU. Always pick the wait strategy under a representative load and look at the impact on neighbours, not just on the Disruptor itself.

Warning

Pre-allocation cuts both ways Slots hold references. Putting a reference to a large object into a slot prevents that object from being garbage-collected until the slot is overwritten on the next lap. For events that contain only primitive fields (longs, fixed-width records), this is fine. For events that hold byte[] payloads, large String references, or other heap-heavy data, a lot of garbage stays pinned in the ring even though the consumer is "done" with it. Either size the ring carefully so the lap time is short, or explicitly null out the heavy fields once a consumer is finished with the slot.

Implementations

Single-producer ring buffer (heart of the Disruptor)

The simplest Disruptor variant: one producer, one consumer, fixed-size ring buffer of pre-allocated event objects. Producer claims the next slot, fills it, publishes by advancing the cursor. Consumer waits for the cursor, then reads. No locks, no per-event allocation. Padding around the cursors prevents false sharing between producer and consumer caches.

 1  import java.util.concurrent.atomic.AtomicLong;
 2  
 3  class Event { long value; }                              // mutable, reused
 4  
 5  class SpscRingBuffer {
 6      private static final int SIZE = 1024;                 // power of two
 7      private static final int MASK = SIZE - 1;
 8      private final Event[] slots = new Event[SIZE];
 9  
10      // Padded cursors: each AtomicLong sits alone in its cache line
11      private final PaddedAtomicLong producerCursor = new PaddedAtomicLong(-1);
12      private final PaddedAtomicLong consumerCursor = new PaddedAtomicLong(-1);
13  
14      SpscRingBuffer() { for (int i = 0; i < SIZE; i++) slots[i] = new Event(); }
15  
16      // Producer
17      void publish(long value) {
18          long seq = producerCursor.get() + 1;
19          while (seq - consumerCursor.get() > SIZE) Thread.onSpinWait();   // wait for room
20          slots[(int)(seq & MASK)].value = value;
21          producerCursor.set(seq);                          // release: makes the write visible
22      }
23  
24      // Consumer
25      long consume() {
26          long seq = consumerCursor.get() + 1;
27          while (seq > producerCursor.get()) Thread.onSpinWait();           // wait for data
28          long v = slots[(int)(seq & MASK)].value;
29          consumerCursor.set(seq);                           // release the slot
30          return v;
31      }
32  }
33  
34  // PaddedAtomicLong: AtomicLong + 7 unused longs to fill 64-byte cache line
35  // (better: use jdk.internal.vm.annotation.Contended or @Contended)

Multi-producer claim with CAS

With multiple producers, each one CASes the producer cursor to its claimed sequence. The catch: a producer that claims sequence N may publish after a producer that claimed N+1, so consumers can't just read producerCursor and assume everything below is published. Disruptor solves this with an "available buffer" array tracking which sequences are actually published.

 1  // Sketch only; the real Disruptor adds an availableBuffer for visibility tracking.
 2  class MpscRingBuffer {
 3      private static final int SIZE = 1024;
 4      private static final int MASK = SIZE - 1;
 5      private final Event[] slots = new Event[SIZE];
 6      private final AtomicLong cursor = new AtomicLong(-1);
 7      private final AtomicLong consumerCursor = new AtomicLong(-1);
 8  
 9      long claim() {
10          while (true) {
11              long current = cursor.get();
12              long seq = current + 1;
13              if (seq - consumerCursor.get() > SIZE) {
14                  Thread.onSpinWait();                       // wait for room
15                  continue;
16              }
17              if (cursor.compareAndSet(current, seq)) {       // try to claim seq
18                  return seq;
19              }
20              // CAS lost to another producer; loop and try the next seq
21          }
22      }
23  
24      void publish(long seq, long value) {
25          slots[(int)(seq & MASK)].value = value;
26          // Real Disruptor: mark availableBuffer[seq & MASK] = (seq >>> log2(SIZE))
27          // Consumer scans availableBuffer to find the highest fully-published seq
28      }
29  }

Using the real Disruptor library

In production, use the LMAX library rather than rolling a custom one. The API exposes the ring buffer through an EventFactory (slot allocator), an EventTranslator (producer-side fill), and an EventHandler (consumer-side process). Wait strategies and producer types are configuration, not code.

 1  // Maven: com.lmax:disruptor:4.0.0
 2  import com.lmax.disruptor.*;
 3  import com.lmax.disruptor.dsl.Disruptor;
 4  import com.lmax.disruptor.dsl.ProducerType;
 5  
 6  class TradeEvent { String symbol; double price; long qty; }
 7  
 8  // Slot factory: pre-allocate the events
 9  EventFactory<TradeEvent> factory = TradeEvent::new;
10  
11  Disruptor<TradeEvent> disruptor = new Disruptor<>(
12      factory,
13      1024,                                                  // ring size, power of two
14      Thread.ofPlatform().name("disruptor-", 0).factory(),
15      ProducerType.SINGLE,                                   // SINGLE is much faster
16      new YieldingWaitStrategy()
17  );
18  
19  // Pipeline: handler1 -> handler2 (handler2 sees handler1's writes)
20  disruptor.handleEventsWith((event, seq, end) -> riskCheck(event))
21           .then((event, seq, end) -> persist(event));
22  disruptor.start();
23  
24  // Publish (producer side)
25  RingBuffer<TradeEvent> rb = disruptor.getRingBuffer();
26  rb.publishEvent((event, seq) -> {
27      event.symbol = "AAPL";
28      event.price = 235.10;
29      event.qty = 100;
30  });
31  
32  // No allocations on the hot path; events recycle around the ring.

Why padding matters (false sharing experiment)

Two unrelated AtomicLongs in the same cache line cause writes from one core to invalidate the other core's cached copy. Result: throughput drops 5-10x. Padding to a full cache line (64 bytes on x86) fixes it. JDK 8+ has @Contended for this.

 1  import jdk.internal.vm.annotation.Contended;
 2  import java.util.concurrent.atomic.AtomicLong;
 3  
 4  // BAD: false sharing
 5  class UnpaddedCursors {
 6      AtomicLong producer = new AtomicLong();              // both fit in
 7      AtomicLong consumer = new AtomicLong();              // one cache line
 8  }
 9  
10  // GOOD: each in its own line
11  class PaddedCursors {
12      @Contended AtomicLong producer = new AtomicLong();
13      @Contended AtomicLong consumer = new AtomicLong();
14  }
15  
16  // Note: @Contended requires --add-opens=java.base/jdk.internal.vm.annotation=ALL-UNNAMED
17  // or running with -XX:-RestrictContended. Most projects just hand-roll padding.

Key points

•Ring buffer of fixed size (must be a power of two so index = sequence & (size - 1) is one AND instruction).
•Slots are pre-allocated objects that get reused; producers fill in fields, consumers read them. Zero per-message allocation.
•Producer cursor: an AtomicLong (cache-line padded) holding the highest published sequence. Consumers spin until their target sequence <= producer cursor.
•Consumer cursor: each consumer has its own cursor. Producers wait if the slowest consumer is too far behind (overwrites would lose data).
•Wait strategies are pluggable: BusySpinWaitStrategy (lowest latency, burns CPU), YieldingWaitStrategy (spin then yield), BlockingWaitStrategy (park on a condition).
•False sharing is the silent killer; every cursor is padded to occupy its own cache line.
•Single-producer mode skips the producer-side CAS and is much faster than multi-producer mode.

Follow-up questions

▸Why is a ring buffer so much faster than a BlockingQueue?

Three reasons. (1) Pre-allocation: slots are reused, no GC pressure. BlockingQueue allocates a node per put. (2) No locks: producers and consumers coordinate through atomic cursor advances; BlockingQueue takes a lock on every put and take. (3) Cache friendliness: the ring is contiguous memory; the next slot is nearly always in L1. A linked-list-backed queue scatters nodes across the heap. The trio together gives 10-100x throughput on hot paths.

▸Why power of two for the ring size?

Index = sequence & (size - 1) instead of sequence % size. AND is a single cycle on every CPU; modulo is several. At 25M messages per second this matters. The cost: ring sizes are restricted to 1024, 2048, 4096, etc. Easy to live with.

▸What happens when the slowest consumer falls too far behind?

Producers stall. A producer about to claim sequence N must check that the slowest consumer has acknowledged sequence (N - SIZE), or it would overwrite a slot still being read. Real systems either size the ring large enough that this never happens, or accept backpressure on the producer. There's no equivalent of 'drop oldest' built in; it has to be implemented as a separate consumer that drains.

▸Single-producer vs multi-producer, how big is the difference?

Big. Single-producer skips the CAS on claim and the availability tracking on publish. Numbers from the LMAX paper: single-producer hits 25M ops/sec on a single thread; multi-producer with 4 threads hits about 100M total but with much higher variance. If the workload can funnel through one producer (one network socket, one input thread), do it.

▸When is the Disruptor the wrong choice?

Whenever pre-allocation doesn't fit the message model (variable-size events that can't be pooled, references to large heap objects), whenever throughput isn't the bottleneck (most CRUD apps), or whenever richer queue semantics are required (priority, deadlines, conditional retrieval). For request/response with low latency, an unbounded BlockingQueue or a Java 21 virtual-thread-per-request model is simpler and fast enough.

Gotchas

!Forgetting padding. The single most common mistake: a 'simple ring buffer' loses 80% of its throughput to false sharing between cursors.
!Using Object[] of allocated objects, not pre-allocated slot fields. Defeats the no-allocation property.
!Multi-producer mode without availability tracking: consumers will read uninitialized slots.
!Wait strategies that burn CPU in production. BusySpinWaitStrategy is for dedicated cores; on shared boxes it starves other work.
!Treating the ring as a dynamically resizable queue. The size is fixed at construction; pick it based on max in-flight, not average.

Common pitfalls

Building a Disruptor for a workload that doesn't need it. The library has nontrivial complexity; the win only shows up under sustained, high-rate, low-latency workloads (HFT, market data, log shipping).
Pinning consumer threads incorrectly. Disruptor benefits from CPU pinning, but pinning the wrong thread to a shared core ruins it.
Mixing Disruptor with reflection-based serialization. Per-event Kryo or Jackson eats the latency budget; pre-marshal upstream.

Practice problems

Implement a single-producer single-consumer lock-free queue

APIs worth memorising

com.lmax.disruptor.RingBuffer
com.lmax.disruptor.dsl.Disruptor
com.lmax.disruptor.WaitStrategy (BlockingWaitStrategy, YieldingWaitStrategy, BusySpinWaitStrategy, SleepingWaitStrategy)
jdk.internal.vm.annotation.Contended

Where this shows up

LMAX trading exchange (the original use case, ~6M trades per second). Log4j2's async logger uses Disruptor for 18M events per second. Apache Storm uses ring-buffer-style transport. Many HFT firms have internal Disruptor variants. Outside of trading and ultra-high-throughput logging, the pattern is rare; most teams reach for a BlockingQueue or a Kafka topic.

LMAX Disruptor

In one line

Diagram

The Disruptor in plain English

Why it is fast

The classic point of comparison is LinkedBlockingQueue or any other standard locked queue from java.util.concurrent. The Disruptor wins on three independent things at once.

Per-message cost	LinkedBlockingQueue	LMAX Disruptor
Allocation	One new node per put	Zero, slots are pre-allocated and reused
Lock	Acquire on every put and take	None, atomic cursor read/write
Memory layout	Scattered nodes on the heap	One contiguous 64KB ring (1024 × 64 bytes)
Garbage on hot path	One node every message, GC churn	None
Cross-core coordination	Lock contention, parking, kernel transitions	Cache-friendly atomic cursor advances

When this is the right tool

The Disruptor is a specialised tool. It pays off when all of these are true:

The workload moves more than around a million messages per second through a single pipeline.
Tail latency matters, that is, the goal is a tight p99 or p99.9, not just an acceptable average.
The work per message is small enough that queue overhead is a real fraction of total time.

How the cursors stay in sync without locks

The whole design is held together by two invariants. As long as both invariants are kept, no lock is needed.

Invariant	What it prevents
`consumerCursor <= producerCursor`	The consumer reading a slot the producer has not written
`producerCursor - consumerCursor <= SIZE`	The producer overwriting a slot the consumer has not read

The producer's protocol for publishing slot N is three steps:

Wait until N - consumerCursor <= SIZE. There is room in the ring.
Write the data into slots[N & MASK]. The mask trick (where MASK = SIZE - 1) replaces a modulo with a single AND because SIZE is a power of two.
Atomically set producerCursor = N with release semantics. This is the publish.

The consumer's protocol for consuming slot N is also three steps:

Wait until N <= producerCursor. The producer has written the slot.
Read slots[N & MASK] with acquire semantics.
Atomically set consumerCursor = N. The slot is now free for the producer to reuse on its next lap.

Wait strategies

BusySpinWaitStrategy. A tight while (!ready) Thread.onSpinWait(); loop. Microsecond-level latency, eats 100% of a core while waiting. Only sensible on a dedicated core in a latency-critical system.
YieldingWaitStrategy. Spin a few times, then call Thread.yield(). Low latency, less CPU greedy than busy spin. The sensible default for most low-latency systems.
SleepingWaitStrategy. Spin briefly, then LockSupport.parkNanos(1). Higher latency, near-zero CPU when idle. Good for cold or low-rate consumers.
BlockingWaitStrategy. Park on a Condition until the producer signals. Highest latency, lowest CPU. Use for consumers that are expected to be cold most of the time, mixed with hot ones in the same pipeline.

Warning

Implementations

Single-producer ring buffer (heart of the Disruptor)

 1  import java.util.concurrent.atomic.AtomicLong;
 2  
 3  class Event { long value; }                              // mutable, reused
 4  
 5  class SpscRingBuffer {
 6      private static final int SIZE = 1024;                 // power of two
 7      private static final int MASK = SIZE - 1;
 8      private final Event[] slots = new Event[SIZE];
 9  
10      // Padded cursors: each AtomicLong sits alone in its cache line
11      private final PaddedAtomicLong producerCursor = new PaddedAtomicLong(-1);
12      private final PaddedAtomicLong consumerCursor = new PaddedAtomicLong(-1);
13  
14      SpscRingBuffer() { for (int i = 0; i < SIZE; i++) slots[i] = new Event(); }
15  
16      // Producer
17      void publish(long value) {
18          long seq = producerCursor.get() + 1;
19          while (seq - consumerCursor.get() > SIZE) Thread.onSpinWait();   // wait for room
20          slots[(int)(seq & MASK)].value = value;
21          producerCursor.set(seq);                          // release: makes the write visible
22      }
23  
24      // Consumer
25      long consume() {
26          long seq = consumerCursor.get() + 1;
27          while (seq > producerCursor.get()) Thread.onSpinWait();           // wait for data
28          long v = slots[(int)(seq & MASK)].value;
29          consumerCursor.set(seq);                           // release the slot
30          return v;
31      }
32  }
33  
34  // PaddedAtomicLong: AtomicLong + 7 unused longs to fill 64-byte cache line
35  // (better: use jdk.internal.vm.annotation.Contended or @Contended)

Multi-producer claim with CAS

 1  // Sketch only; the real Disruptor adds an availableBuffer for visibility tracking.
 2  class MpscRingBuffer {
 3      private static final int SIZE = 1024;
 4      private static final int MASK = SIZE - 1;
 5      private final Event[] slots = new Event[SIZE];
 6      private final AtomicLong cursor = new AtomicLong(-1);
 7      private final AtomicLong consumerCursor = new AtomicLong(-1);
 8  
 9      long claim() {
10          while (true) {
11              long current = cursor.get();
12              long seq = current + 1;
13              if (seq - consumerCursor.get() > SIZE) {
14                  Thread.onSpinWait();                       // wait for room
15                  continue;
16              }
17              if (cursor.compareAndSet(current, seq)) {       // try to claim seq
18                  return seq;
19              }
20              // CAS lost to another producer; loop and try the next seq
21          }
22      }
23  
24      void publish(long seq, long value) {
25          slots[(int)(seq & MASK)].value = value;
26          // Real Disruptor: mark availableBuffer[seq & MASK] = (seq >>> log2(SIZE))
27          // Consumer scans availableBuffer to find the highest fully-published seq
28      }
29  }

Using the real Disruptor library

 1  // Maven: com.lmax:disruptor:4.0.0
 2  import com.lmax.disruptor.*;
 3  import com.lmax.disruptor.dsl.Disruptor;
 4  import com.lmax.disruptor.dsl.ProducerType;
 5  
 6  class TradeEvent { String symbol; double price; long qty; }
 7  
 8  // Slot factory: pre-allocate the events
 9  EventFactory<TradeEvent> factory = TradeEvent::new;
10  
11  Disruptor<TradeEvent> disruptor = new Disruptor<>(
12      factory,
13      1024,                                                  // ring size, power of two
14      Thread.ofPlatform().name("disruptor-", 0).factory(),
15      ProducerType.SINGLE,                                   // SINGLE is much faster
16      new YieldingWaitStrategy()
17  );
18  
19  // Pipeline: handler1 -> handler2 (handler2 sees handler1's writes)
20  disruptor.handleEventsWith((event, seq, end) -> riskCheck(event))
21           .then((event, seq, end) -> persist(event));
22  disruptor.start();
23  
24  // Publish (producer side)
25  RingBuffer<TradeEvent> rb = disruptor.getRingBuffer();
26  rb.publishEvent((event, seq) -> {
27      event.symbol = "AAPL";
28      event.price = 235.10;
29      event.qty = 100;
30  });
31  
32  // No allocations on the hot path; events recycle around the ring.

Why padding matters (false sharing experiment)

 1  import jdk.internal.vm.annotation.Contended;
 2  import java.util.concurrent.atomic.AtomicLong;
 3  
 4  // BAD: false sharing
 5  class UnpaddedCursors {
 6      AtomicLong producer = new AtomicLong();              // both fit in
 7      AtomicLong consumer = new AtomicLong();              // one cache line
 8  }
 9  
10  // GOOD: each in its own line
11  class PaddedCursors {
12      @Contended AtomicLong producer = new AtomicLong();
13      @Contended AtomicLong consumer = new AtomicLong();
14  }
15  
16  // Note: @Contended requires --add-opens=java.base/jdk.internal.vm.annotation=ALL-UNNAMED
17  // or running with -XX:-RestrictContended. Most projects just hand-roll padding.

Key points

•Ring buffer of fixed size (must be a power of two so index = sequence & (size - 1) is one AND instruction).
•Slots are pre-allocated objects that get reused; producers fill in fields, consumers read them. Zero per-message allocation.
•Producer cursor: an AtomicLong (cache-line padded) holding the highest published sequence. Consumers spin until their target sequence <= producer cursor.
•Consumer cursor: each consumer has its own cursor. Producers wait if the slowest consumer is too far behind (overwrites would lose data).
•Wait strategies are pluggable: BusySpinWaitStrategy (lowest latency, burns CPU), YieldingWaitStrategy (spin then yield), BlockingWaitStrategy (park on a condition).
•False sharing is the silent killer; every cursor is padded to occupy its own cache line.
•Single-producer mode skips the producer-side CAS and is much faster than multi-producer mode.

Follow-up questions

▸Why is a ring buffer so much faster than a BlockingQueue?

▸Why power of two for the ring size?

▸What happens when the slowest consumer falls too far behind?

▸Single-producer vs multi-producer, how big is the difference?

▸When is the Disruptor the wrong choice?

Gotchas

!Forgetting padding. The single most common mistake: a 'simple ring buffer' loses 80% of its throughput to false sharing between cursors.
!Using Object[] of allocated objects, not pre-allocated slot fields. Defeats the no-allocation property.
!Multi-producer mode without availability tracking: consumers will read uninitialized slots.
!Wait strategies that burn CPU in production. BusySpinWaitStrategy is for dedicated cores; on shared boxes it starves other work.
!Treating the ring as a dynamically resizable queue. The size is fixed at construction; pick it based on max in-flight, not average.

Common pitfalls

Building a Disruptor for a workload that doesn't need it. The library has nontrivial complexity; the win only shows up under sustained, high-rate, low-latency workloads (HFT, market data, log shipping).
Pinning consumer threads incorrectly. Disruptor benefits from CPU pinning, but pinning the wrong thread to a shared core ruins it.
Mixing Disruptor with reflection-based serialization. Per-event Kryo or Jackson eats the latency budget; pre-marshal upstream.

Practice problems

Implement a single-producer single-consumer lock-free queue

APIs worth memorising

com.lmax.disruptor.RingBuffer
com.lmax.disruptor.dsl.Disruptor
com.lmax.disruptor.WaitStrategy (BlockingWaitStrategy, YieldingWaitStrategy, BusySpinWaitStrategy, SleepingWaitStrategy)
jdk.internal.vm.annotation.Contended

Where this shows up

LMAX Disruptor

Diagram

The Disruptor in plain English

Why it is fast

When this is the right tool

How the cursors stay in sync without locks

Wait strategies

Implementations

Key points

Follow-up questions

Gotchas

Common pitfalls

Practice problems

APIs worth memorising

Related reading

LMAX Disruptor

Diagram

The Disruptor in plain English

Why it is fast

When this is the right tool

How the cursors stay in sync without locks

Wait strategies

Implementations

Key points

Follow-up questions

Gotchas

Common pitfalls

Practice problems

APIs worth memorising

Related reading