RCU: Read-Copy-Update

The library-book analogy

Picture a town library with one shelf set aside for the official phone directory. People walk in all day, grab the directory, look up a number, put it back, and leave. These are the readers. Once a week, the librarian receives a new edition with a few corrections. The librarian is the writer.

The lazy way to update the directory would be to make every reader stop, snatch the book away, write the corrections in place, and let everyone resume. That blocks every reader for the duration of the update. RCU does something different.

The librarian prints a complete new edition off-site, with the corrections already applied. No reader has any idea this is happening.
The librarian walks to the shelf and swaps the new edition in. The old edition is moved into a back room. Anyone who walks in from this point sees the new edition.
Anyone who happened to already be reading the old edition keeps reading it; nothing has changed for them. The librarian waits until every one of those readers has finished and put the old edition back.
Once nobody is reading the old edition, the librarian throws it away.

That is RCU. The full name spells it out: Read, Copy, Update. Readers read freely. Writers prepare a new copy and swap it in. Old copies sit in the back room until no reader is using them, then they go away.

The waiting period in step 3 is called the grace period. It is the only piece of the protocol that needs special machinery; everything else is "do nothing different than usual".

A picture of the pattern

There is a single shared pointer that every reader loads and dereferences. At any moment that pointer points at one immutable copy of the data. The writer never modifies the existing copy. Instead, the writer prepares a new copy, swaps the pointer to point at the new copy, and lets the old copy stay alive in memory until everyone who already loaded the old pointer is done with it.

Steady state. The shared pointer points at the current copy. All readers dereference it and see the same value.

The cost on the read path is a single pointer load. Nothing else is needed because the data behind the pointer never changes while the pointer is pointing at it.

During an update. The writer allocates a new copy with the new values, then atomically swaps the shared pointer to the new copy. Readers that load the pointer after the swap see the new copy. Readers that loaded the pointer before the swap still hold a reference to the old copy and keep working with it.

No reader ever sees a half-updated value, because no copy is ever modified after it is published. Each reader holds a fully-formed snapshot.

Reclaiming the old copy. Once every reader that started before the swap has finished, the old copy has no outstanding readers and the writer can free it. The window between the swap and that moment is the grace period.

The reader and writer protocols

Reads run constantly and have to be cheap. Writes run rarely and can afford to be slower.

Reader protocol:

Mark the start of a "reader region". In the Linux kernel this is rcu_read_lock(). In a GC language, this is just holding a local reference to the loaded pointer.
Load the pointer.
Use the data through that pointer. It will not change underneath the reader, because nothing modifies a published copy.
Mark the end of the reader region. In the kernel this is rcu_read_unlock().

Writer protocol:

Allocate a new copy of the data.
Modify the new copy with the new values.
Atomically swap the shared pointer to point at the new copy.
Wait for the grace period to end, that is, wait for every reader that started before the swap to finish.
Free the old copy.

Reads are nearly free. One pointer load is the entire cost. Writes are expensive (copy, swap, wait, free) but rare. That asymmetry is the whole reason RCU exists.

Where the grace period comes from

In the Linux kernel, synchronize_rcu() blocks the writer until every CPU has voluntarily yielded since the swap. A reader region is a stretch of code that holds an RCU read lock and never voluntarily yields. So once every CPU has yielded, the kernel knows that no reader region from before the swap can still be running, and the old copy is safe to free.

In garbage-collected languages, the GC handles the grace period automatically. As long as some reader thread still has a local reference to the old copy, the GC keeps the old copy alive. Once the last reader releases its reference, the GC eventually frees the old copy. RCU semantics fall out of AtomicReference in Java or atomic.Pointer[T] in Go without any explicit grace-period machinery.

How RCU compares to a read-write lock

A read-write lock and RCU solve the same kind of problem (read-mostly shared data) with very different cost profiles.

Path	Read-write lock	RCU
Reader cost	Increment a shared reader counter, do work, decrement the counter. Two atomics and cache-line traffic between readers.	Load the shared pointer. One atomic load, no shared writes.
Writer cost	Block all readers, mutate in place, release. Cheap if writes are short.	Allocate a new copy, modify it, atomically swap the pointer, wait for the grace period, free the old copy. Expensive.
Reader-writer interaction	Writers block readers during the write.	Writers never block readers. Old readers keep using the old copy.
Best for	Frequent writes, larger data	Rare writes, small data

If reads dominate by a large factor (a thousand reads per write or more) and the data is small enough to copy, RCU is dramatically faster than a read-write lock. If writes are frequent or the data is large, the copy cost in RCU dominates and a read-write lock (or even a plain mutex) is the better choice.

When to use it

There are three shapes of problem that fit RCU well, and they all share the same property: reads happen constantly, writes happen rarely, and the data is small enough that copying it is not a real cost.

Configuration, routing, and dispatch tables. A web service has a config object, a router has a routing table, a dispatcher has a map from request type to handler. Every request reads it. The data is updated when an operator pushes a new config or when a route is registered, which might happen seconds, minutes, or hours apart. Read latency is the constraint that matters.

Hot-swappable immutable services or builders. A service object (a database client, a feature-flag client, a metrics emitter) that needs to be replaced atomically when its underlying configuration changes. Build the new instance, swap the pointer, the old instance is reclaimed when the last in-flight call finishes.

Read-mostly snapshots. Statistics counters, observability data, last-known values that get exposed for monitoring. Readers want a consistent snapshot, the writer publishes a new snapshot every so often.

In all three shapes, the cost on the read path is one pointer load. That is what RCU buys.

When not to use it

The same properties that make RCU fast for read-mostly data make it slow or wrong in three other shapes.

Write-heavy workloads. Every write allocates a new copy and frees the old one. If writes are frequent or the data is large (a hash map with millions of entries, an inventory of tens of thousands of objects), the copy cost dominates and a plain mutex or a read-write lock is faster. RCU is not a general replacement for locks; it is a specialised tool for read-heavy data.

Multi-step consistency across more than one structure. RCU gives a consistent snapshot of whatever the one pointer points at. If correctness depends on two structures changing together (a pair of related caches, an account balance and an audit log), a single RCU pointer is not enough. Coordinating two RCU updates so that no reader sees one updated and the other not is genuinely hard, and most of the time the right answer is to wrap both in one struct and RCU the struct.

Code that mutates in place. RCU's safety property comes from "no copy is ever modified after publication". Writing to a field of the current copy, even one that nobody is reading right this instant, breaks the invariant. The next reader can see a half-applied update. Copy is not a suggestion; it is the core requirement.

In practice

What this looks like in real code depends on the language.

In Java and Go, the read-mostly RCU pattern is built directly into the standard library. AtomicReference<T> in Java and atomic.Pointer[T] in Go give the atomic pointer swap. The garbage collector takes care of the grace period: as long as a reader thread holds a local reference to the old object, the GC keeps it alive, and reclaims it once the last reference goes out of scope. There is no separate "RCU library" to install. For read-heavy hot-swap of small immutable structs, this is the idiomatic way to write the code.

In C and C++ inside the Linux kernel, use the kernel's RCU API directly. The full set of primitives (rcu_read_lock, rcu_read_unlock, rcu_dereference, rcu_assign_pointer, synchronize_rcu, call_rcu, plus the SRCU and sleepable variants) is well-documented and is one of the most performance-critical pieces of the kernel.

In C and C++ in user-space, do not roll your own grace-period mechanism. Use a vetted library: Userspace RCU (the URCU project), folly's RCU support, or libcds.

In Rust, the closest equivalent is crossbeam's epoch-based reclamation. It is used inside many lock-free data structures across the ecosystem, including in dashmap and the standard std::sync::atomic::AtomicPtr patterns built on top of it.

The mental model to take away

RCU is one sentence: publish a new immutable copy via an atomic pointer swap, and let old readers finish on the old copy before reclaiming it.

A reader sees whatever the pointer was pointing at when the reader loaded it. The writer never touches what existing readers are looking at; it prepares a new copy off to the side and swaps the pointer in one step. The grace period is the polite delay before the old copy is freed.

This pattern shows up everywhere under different names: "hot-swappable config", "atomic replace", "publish-subscribe of immutable snapshots", "copy-on-write singleton". It is not necessary to call it RCU, or to use a library that calls itself RCU, to be using exactly this design.

The library-book analogy

The librarian prints a complete new edition off-site, with the corrections already applied. No reader has any idea this is happening.
The librarian walks to the shelf and swaps the new edition in. The old edition is moved into a back room. Anyone who walks in from this point sees the new edition.
Anyone who happened to already be reading the old edition keeps reading it; nothing has changed for them. The librarian waits until every one of those readers has finished and put the old edition back.
Once nobody is reading the old edition, the librarian throws it away.

The waiting period in step 3 is called the grace period. It is the only piece of the protocol that needs special machinery; everything else is "do nothing different than usual".

A picture of the pattern

Steady state. The shared pointer points at the current copy. All readers dereference it and see the same value.

The cost on the read path is a single pointer load. Nothing else is needed because the data behind the pointer never changes while the pointer is pointing at it.

No reader ever sees a half-updated value, because no copy is ever modified after it is published. Each reader holds a fully-formed snapshot.

The reader and writer protocols

Reads run constantly and have to be cheap. Writes run rarely and can afford to be slower.

Reader protocol:

Mark the start of a "reader region". In the Linux kernel this is rcu_read_lock(). In a GC language, this is just holding a local reference to the loaded pointer.
Load the pointer.
Use the data through that pointer. It will not change underneath the reader, because nothing modifies a published copy.
Mark the end of the reader region. In the kernel this is rcu_read_unlock().

Writer protocol:

Allocate a new copy of the data.
Modify the new copy with the new values.
Atomically swap the shared pointer to point at the new copy.
Wait for the grace period to end, that is, wait for every reader that started before the swap to finish.
Free the old copy.

Reads are nearly free. One pointer load is the entire cost. Writes are expensive (copy, swap, wait, free) but rare. That asymmetry is the whole reason RCU exists.

Where the grace period comes from

How RCU compares to a read-write lock

A read-write lock and RCU solve the same kind of problem (read-mostly shared data) with very different cost profiles.

Path	Read-write lock	RCU
Reader cost	Increment a shared reader counter, do work, decrement the counter. Two atomics and cache-line traffic between readers.	Load the shared pointer. One atomic load, no shared writes.
Writer cost	Block all readers, mutate in place, release. Cheap if writes are short.	Allocate a new copy, modify it, atomically swap the pointer, wait for the grace period, free the old copy. Expensive.
Reader-writer interaction	Writers block readers during the write.	Writers never block readers. Old readers keep using the old copy.
Best for	Frequent writes, larger data	Rare writes, small data

When to use it

In all three shapes, the cost on the read path is one pointer load. That is what RCU buys.

When not to use it

The same properties that make RCU fast for read-mostly data make it slow or wrong in three other shapes.

In practice

What this looks like in real code depends on the language.

In C and C++ in user-space, do not roll your own grace-period mechanism. Use a vetted library: Userspace RCU (the URCU project), folly's RCU support, or libcds.

The mental model to take away

RCU is one sentence: publish a new immutable copy via an atomic pointer swap, and let old readers finish on the old copy before reclaiming it.

The library-book analogy

A picture of the pattern

The reader and writer protocols

Where the grace period comes from

How RCU compares to a read-write lock

When to use it

When not to use it

In practice

The mental model to take away

Implementations

Key points

Follow-up questions

Gotchas

Related reading

RCU: Read-Copy-Update

The library-book analogy

A picture of the pattern

The reader and writer protocols

Where the grace period comes from

How RCU compares to a read-write lock

When to use it

When not to use it

In practice

The mental model to take away

Implementations

Key points

Follow-up questions

Gotchas

Related reading