Concurrent Trie (RCU-Style Read Path)

What a trie is

A trie is a tree where each edge represents one character (or one chunk of bits). Words that share a prefix share the path from the root. Inserting the four words to, ten, tea, and ted produces a tree where the single edge t at the top is reused by all four words, and the second edge splits into o (for to) and e (which then branches further into a, d, and n).

This shape is good for prefix queries: autocomplete ("find every word that starts with 'te'"), IP routing (longest-prefix match for a destination address), file system lookups by path. A hash map cannot do prefix queries cheaply; a trie can, because the shared prefix is a single path through the tree.

Concurrent tries matter wherever those prefix queries run hot: networking gear doing per-packet lookups, autocomplete services serving millions of queries per second, distributed key-value stores with prefix indexing.

The RCU pattern, generalised

The right concurrency strategy for a read-heavy trie is Read, Copy, Update, the same pattern the Linux kernel uses for read-heavy data structures. The three steps are:

Read. Traverse the trie via plain atomic pointer reads. No locks anywhere on the read path. The traversal is internally consistent because the writer never mutates a node that any reader can already see; the writer always prepares a brand-new copy off to the side and only publishes it at the end.
Update. Copy the path from the root down to the node that needs to change. Apply the change on the copy. Atomically swap the root pointer to the new tree.
Reclaim. Free the old nodes once every reader that started before the swap has finished. In a garbage-collected language this is automatic; in C and C++ it requires hazard pointers or epoch-based reclamation.

The key trick is structural sharing. The new tree does not need to be a complete copy of the old tree; it only needs new copies of the nodes on the changed path. Every other branch of the tree can be shared with the old tree, because nothing has changed in those branches.

Updating the value stored at the word ten looks like this. Before the update, there is one tree rooted at root. The writer copies the spine from the root down to the n node (which is the one that needs to change), but the unrelated branches a (for tea) and d (for ted) are not copied. The new tree shares them with the old. The writer then swaps the root pointer atomically.

The dotted lines connecting a and d between the two trees show the structural sharing: those nodes physically live once in memory, both trees reference them. The spine from root through t and e is duplicated because every node along the changed path had to be re-created with a new child pointer. The n node is the actual update.

A reader that started its traversal before the root swap sees root_old and walks down the old spine (t, e, n with the old value). A reader that starts after the root swap sees root_new and walks down the new spine (t (new), e (new), n (new)). Both readers see internally consistent trees. Neither one ever sees a half-finished mutation. Once every reader that started before the swap has moved on, the old spine nodes (the boxes labelled t, o, e, n (old) in the "Before" subgraph) have no incoming references and can be reclaimed.

Reads on this design are about as fast as it is possible to make them: a sequence of atomic pointer loads with no atomics on the data path itself. Updates pay for the spine copy. The trade is excellent whenever reads vastly outnumber writes, which is exactly the shape of routing tables, autocomplete dictionaries, and configuration tries.

Java's CHAMP (Compressed Hash-Array Mapped Prefix-tree)

Java's standard ConcurrentHashMap does not use this directly, but several immutable persistent collection libraries do. CHAMP-based maps (used by Vavr, Capsule, and Clojure's PersistentHashMap) build on the same path-copy idea but with a wider branching factor: each node has up to 32 children, and a small bitmap tells the reader which slots are occupied. The wider branching keeps the tree shallow (typical depth around six or seven for any realistic key set) while still giving structural sharing on update. The result is an immutable hash map with O(log₃₂ n) reads and writes that supports lock-free reads through the same atomic-root-swap mechanism described above.

When not to use it

The path-copy trie is a specialised tool, just like the Disruptor and RCU. The cases where it loses are:

Reads and writes are balanced. Every write copies the entire spine from root to leaf. If writes happen as often as reads, that copy cost dominates and a regular concurrent hash map is faster.
The data is small. A trie has overhead per node (the children array or the bitmap, plus the atomic pointer fields). For a few dozen entries the constant-factor cost outweighs the tree-traversal benefit.
No prefix queries are needed. If the access pattern is purely "look up by exact key", a concurrent hash map is simpler and scales better. The trie's main advantage (cheap prefix iteration) is unused.

For everything that does have heavy prefix queries and a strong read-to-write ratio, the path-copy trie remains one of the cleanest concurrent designs available.

What a trie is

The RCU pattern, generalised

The right concurrency strategy for a read-heavy trie is Read, Copy, Update, the same pattern the Linux kernel uses for read-heavy data structures. The three steps are:

Read. Traverse the trie via plain atomic pointer reads. No locks anywhere on the read path. The traversal is internally consistent because the writer never mutates a node that any reader can already see; the writer always prepares a brand-new copy off to the side and only publishes it at the end.
Update. Copy the path from the root down to the node that needs to change. Apply the change on the copy. Atomically swap the root pointer to the new tree.
Reclaim. Free the old nodes once every reader that started before the swap has finished. In a garbage-collected language this is automatic; in C and C++ it requires hazard pointers or epoch-based reclamation.

Java's CHAMP (Compressed Hash-Array Mapped Prefix-tree)

When not to use it

The path-copy trie is a specialised tool, just like the Disruptor and RCU. The cases where it loses are:

Reads and writes are balanced. Every write copies the entire spine from root to leaf. If writes happen as often as reads, that copy cost dominates and a regular concurrent hash map is faster.
The data is small. A trie has overhead per node (the children array or the bitmap, plus the atomic pointer fields). For a few dozen entries the constant-factor cost outweighs the tree-traversal benefit.
No prefix queries are needed. If the access pattern is purely "look up by exact key", a concurrent hash map is simpler and scales better. The trie's main advantage (cheap prefix iteration) is unused.

For everything that does have heavy prefix queries and a strong read-to-write ratio, the path-copy trie remains one of the cleanest concurrent designs available.

What a trie is

The RCU pattern, generalised

Java's CHAMP (Compressed Hash-Array Mapped Prefix-tree)

When not to use it

Implementations

Key points

Follow-up questions

Related reading

Concurrent Trie (RCU-Style Read Path)

What a trie is

The RCU pattern, generalised

Java's CHAMP (Compressed Hash-Array Mapped Prefix-tree)

When not to use it

Implementations

Key points

Follow-up questions

Related reading