RCU: Read-Copy-Update
Read-Copy-Update: writers create a new copy and atomically swap the pointer; readers see either the old or the new, never a partial state. Reads are nearly free (often just a pointer load). Writers are slower but rare. Used heavily in the Linux kernel for read-mostly data structures.
The library-book analogy
Picture a town library with one shelf set aside for the official phone directory. People walk in all day, grab the directory, look up a number, put it back, and leave. These are the readers. Once a week, the librarian receives a new edition with a few corrections. The librarian is the writer.
The lazy way to update the directory would be to make every reader stop, snatch the book away, write the corrections in place, and let everyone resume. That blocks every reader for the duration of the update. RCU does something different.
- The librarian prints a complete new edition off-site, with the corrections already applied. No reader has any idea this is happening.
- The librarian walks to the shelf and swaps the new edition in. The old edition is moved into a back room. Anyone who walks in from this point sees the new edition.
- Anyone who happened to already be reading the old edition keeps reading it; nothing has changed for them. The librarian waits until every one of those readers has finished and put the old edition back.
- Once nobody is reading the old edition, the librarian throws it away.
That is RCU. The full name spells it out: Read, Copy, Update. Readers read freely. Writers prepare a new copy and swap it in. Old copies sit in the back room until no reader is using them, then they go away.
The waiting period in step 3 is called the grace period. It is the only piece of the protocol that needs special machinery; everything else is "do nothing different than usual".
A picture of the pattern
There is a single shared pointer that every reader loads and dereferences. At any moment that pointer points at one immutable copy of the data. The writer never modifies the existing copy. Instead, the writer prepares a new copy, swaps the pointer to point at the new copy, and lets the old copy stay alive in memory until everyone who already loaded the old pointer is done with it.
Steady state. The shared pointer points at the current copy. All readers dereference it and see the same value.
The cost on the read path is a single pointer load. Nothing else is needed because the data behind the pointer never changes while the pointer is pointing at it.
During an update. The writer allocates a new copy with the new values, then atomically swaps the shared pointer to the new copy. Readers that load the pointer after the swap see the new copy. Readers that loaded the pointer before the swap still hold a reference to the old copy and keep working with it.
No reader ever sees a half-updated value, because no copy is ever modified after it is published. Each reader holds a fully-formed snapshot.
Reclaiming the old copy. Once every reader that started before the swap has finished, the old copy has no outstanding readers and the writer can free it. The window between the swap and that moment is the grace period.
The reader and writer protocols
Reads run constantly and have to be cheap. Writes run rarely and can afford to be slower.
Reader protocol:
- Mark the start of a "reader region". In the Linux kernel this is
rcu_read_lock(). In a GC language, this is just holding a local reference to the loaded pointer. - Load the pointer.
- Use the data through that pointer. It will not change underneath the reader, because nothing modifies a published copy.
- Mark the end of the reader region. In the kernel this is
rcu_read_unlock().
Writer protocol:
- Allocate a new copy of the data.
- Modify the new copy with the new values.
- Atomically swap the shared pointer to point at the new copy.
- Wait for the grace period to end, that is, wait for every reader that started before the swap to finish.
- Free the old copy.
Reads are nearly free. One pointer load is the entire cost. Writes are expensive (copy, swap, wait, free) but rare. That asymmetry is the whole reason RCU exists.
Where the grace period comes from
In the Linux kernel, synchronize_rcu() blocks the writer until every CPU has voluntarily yielded since the swap. A reader region is a stretch of code that holds an RCU read lock and never voluntarily yields. So once every CPU has yielded, the kernel knows that no reader region from before the swap can still be running, and the old copy is safe to free.
In garbage-collected languages, the GC handles the grace period automatically. As long as some reader thread still has a local reference to the old copy, the GC keeps the old copy alive. Once the last reader releases its reference, the GC eventually frees the old copy. RCU semantics fall out of AtomicReference in Java or atomic.Pointer[T] in Go without any explicit grace-period machinery.
How RCU compares to a read-write lock
A read-write lock and RCU solve the same kind of problem (read-mostly shared data) with very different cost profiles.
| Path | Read-write lock | RCU |
|---|---|---|
| Reader cost | Increment a shared reader counter, do work, decrement the counter. Two atomics and cache-line traffic between readers. | Load the shared pointer. One atomic load, no shared writes. |
| Writer cost | Block all readers, mutate in place, release. Cheap if writes are short. | Allocate a new copy, modify it, atomically swap the pointer, wait for the grace period, free the old copy. Expensive. |
| Reader-writer interaction | Writers block readers during the write. | Writers never block readers. Old readers keep using the old copy. |
| Best for | Frequent writes, larger data | Rare writes, small data |
If reads dominate by a large factor (a thousand reads per write or more) and the data is small enough to copy, RCU is dramatically faster than a read-write lock. If writes are frequent or the data is large, the copy cost in RCU dominates and a read-write lock (or even a plain mutex) is the better choice.
When to use it
There are three shapes of problem that fit RCU well, and they all share the same property: reads happen constantly, writes happen rarely, and the data is small enough that copying it is not a real cost.
Configuration, routing, and dispatch tables. A web service has a config object, a router has a routing table, a dispatcher has a map from request type to handler. Every request reads it. The data is updated when an operator pushes a new config or when a route is registered, which might happen seconds, minutes, or hours apart. Read latency is the constraint that matters.
Hot-swappable immutable services or builders. A service object (a database client, a feature-flag client, a metrics emitter) that needs to be replaced atomically when its underlying configuration changes. Build the new instance, swap the pointer, the old instance is reclaimed when the last in-flight call finishes.
Read-mostly snapshots. Statistics counters, observability data, last-known values that get exposed for monitoring. Readers want a consistent snapshot, the writer publishes a new snapshot every so often.
In all three shapes, the cost on the read path is one pointer load. That is what RCU buys.
When not to use it
The same properties that make RCU fast for read-mostly data make it slow or wrong in three other shapes.
Write-heavy workloads. Every write allocates a new copy and frees the old one. If writes are frequent or the data is large (a hash map with millions of entries, an inventory of tens of thousands of objects), the copy cost dominates and a plain mutex or a read-write lock is faster. RCU is not a general replacement for locks; it is a specialised tool for read-heavy data.
Multi-step consistency across more than one structure. RCU gives a consistent snapshot of whatever the one pointer points at. If correctness depends on two structures changing together (a pair of related caches, an account balance and an audit log), a single RCU pointer is not enough. Coordinating two RCU updates so that no reader sees one updated and the other not is genuinely hard, and most of the time the right answer is to wrap both in one struct and RCU the struct.
Code that mutates in place. RCU's safety property comes from "no copy is ever modified after publication". Writing to a field of the current copy, even one that nobody is reading right this instant, breaks the invariant. The next reader can see a half-applied update. Copy is not a suggestion; it is the core requirement.
In practice
What this looks like in real code depends on the language.
In Java and Go, the read-mostly RCU pattern is built directly into the standard library. AtomicReference<T> in Java and atomic.Pointer[T] in Go give the atomic pointer swap. The garbage collector takes care of the grace period: as long as a reader thread holds a local reference to the old object, the GC keeps it alive, and reclaims it once the last reference goes out of scope. There is no separate "RCU library" to install. For read-heavy hot-swap of small immutable structs, this is the idiomatic way to write the code.
In C and C++ inside the Linux kernel, use the kernel's RCU API directly. The full set of primitives (rcu_read_lock, rcu_read_unlock, rcu_dereference, rcu_assign_pointer, synchronize_rcu, call_rcu, plus the SRCU and sleepable variants) is well-documented and is one of the most performance-critical pieces of the kernel.
In C and C++ in user-space, do not roll your own grace-period mechanism. Use a vetted library: Userspace RCU (the URCU project), folly's RCU support, or libcds.
In Rust, the closest equivalent is crossbeam's epoch-based reclamation. It is used inside many lock-free data structures across the ecosystem, including in dashmap and the standard std::sync::atomic::AtomicPtr patterns built on top of it.
The mental model to take away
RCU is one sentence: publish a new immutable copy via an atomic pointer swap, and let old readers finish on the old copy before reclaiming it.
A reader sees whatever the pointer was pointing at when the reader loaded it. The writer never touches what existing readers are looking at; it prepares a new copy off to the side and swaps the pointer in one step. The grace period is the polite delay before the old copy is freed.
This pattern shows up everywhere under different names: "hot-swappable config", "atomic replace", "publish-subscribe of immutable snapshots", "copy-on-write singleton". It is not necessary to call it RCU, or to use a library that calls itself RCU, to be using exactly this design.
Implementations
Same pattern in Java. The AtomicReference allows the pointer to be swapped atomically; the GC handles reclamation; readers see either the old or new immutable object, never a partial state.
1 public class ConfigService {
2 private final AtomicReference<Config> current = new AtomicReference<>(initial);
3
4 // Reader
5 public Config snapshot() {
6 return current.get(); // O(1), lock-free
7 }
8
9 // Writer
10 public void update(Config next) {
11 current.set(next); // atomic swap; old eligible for GC
12 }
13 }
14
15 // Reader use:
16 Config cfg = configService.snapshot();
17 // Use cfg consistently; if writer swaps, this read still sees the oldKey points
- •Reads are extremely cheap: dereference a pointer. No locks, no atomics on the read path.
- •Writes: copy the data, modify, atomically swap the pointer to the new copy.
- •Old copy must be reclaimed only after all readers that might be using it have moved on.
- •Grace period: readers register a 'reader region'; reclamation waits for all reader regions to end.
- •Best for read-mostly data: routing tables, configuration, dispatch tables. Not for hot writes.
Follow-up questions
▸What is the grace period?
▸When should I use RCU vs sync.RWMutex (or RWMutex equivalent)?
▸Can I update a single field without copying?
▸Why is RCU so popular in the Linux kernel?
Gotchas
- !Mutating in place defeats RCU; copy + swap is required
- !Writers must wait for the grace period before freeing the old copy (in non-GC code)
- !RCU is read-mostly; for write-heavy data, the copy cost dominates
- !Multi-step updates (modify two related structures consistently) need extra coordination
- !Long reader regions delay reclamation; in the kernel, this can pin memory