Consistent Hashing & Ring Algorithms
Architecture
The Problem with Modulo Hashing
You have 100 cache servers. You pick one for each key with hash(key) % 100. It works. Distribution is uniform. Lookups are O(1). Life is good.
Then you add server 101.
hash(key) % 101 produces a completely different result than hash(key) % 100 for almost every key. About 99 out of 100 keys now map to a different server. Your entire cache goes cold simultaneously. Every backend database behind that cache gets slammed with traffic that was previously absorbed by the cache layer. You just caused your own outage by scaling up.
This is not a theoretical concern. It is the exact scenario that motivated the original consistent hashing paper from Karger et al. at MIT in 1997, written in collaboration with Akamai to solve CDN cache distribution.
The Ring
Consistent hashing arranges the hash space as a ring (sometimes called a hash circle). Imagine the output of your hash function as a circle from 0 to 2^32. Each server gets hashed to a position on this ring. Each key also gets hashed to a position on the ring. To find which server owns a key, start at the key's position and walk clockwise until you hit a server.
When you add a new server, it takes a position on the ring and absorbs keys from the segment between it and its counter-clockwise neighbor. Only keys in that one segment move. Everything else stays put.
When you remove a server, its keys shift clockwise to the next server on the ring. Again, only one segment moves.
With N servers, adding or removing one remaps approximately 1/N of the total keys. That is the core property. With 100 servers, you remap about 1% of keys instead of 99%.
Virtual Nodes: Why the Raw Ring is Not Enough
Here is the problem nobody mentions in the textbook explanation. Hash functions are not perfect. If you hash three server names to positions on the ring, you might get positions 100, 200, and 950000. Server C now owns almost the entire ring. The "uniform distribution" guarantee of hash functions applies to large numbers of samples, and three is not a large number.
Virtual nodes fix this. Instead of hashing each server to one position, hash it to 150 positions. Server A gets positions from hash("A-0"), hash("A-1"), through hash("A-149"). Server B gets a similar set. Now you have 450 points scattered around the ring, and the law of large numbers actually kicks in. Load variance drops dramatically.
The math behind this is worth understanding intuitively. With N servers and V virtual nodes each, the total ring positions are N*V. The expected load per server is 1/N of the total keyspace. The standard deviation of the load is proportional to 1/sqrt(V). So doubling your virtual nodes reduces variance by about 30%. Going from 10 virtual nodes (where you might see 2x load imbalance) to 150 (where imbalance drops below 10%) is a significant jump in fairness.
The tradeoff: more virtual nodes means a larger ring data structure in memory and slightly slower lookups since you are binary-searching through more points. In practice, 150 virtual nodes per server is the common recommendation because the memory is negligible (a few KB per server) and the lookup remains fast with binary search.
Lookup Mechanics
The ring is typically stored as a sorted array of (position, server) pairs. Looking up a key means:
- Hash the key to get a ring position
- Binary search the sorted array for the first position >= the key's hash
- If you fall off the end, wrap to the first entry (it is a ring)
That binary search gives you O(log(N * V)) lookup time, where N is servers and V is virtual nodes per server. With 100 servers and 150 virtual nodes, that is log(15000), about 14 comparisons. Trivial.
Some implementations use a jump list or skip list for concurrent access. Others, like Cassandra, precompute a token-to-server map and distribute it to all nodes so lookups are effectively a local binary search with no network hop.
Jump Hash: The Memoryless Alternative
In 2014, Google published Jump Hash, a consistent hash function that needs no ring at all. It is a short, elegant algorithm (about 10 lines of code) that maps a key to one of N buckets with perfect uniformity. No virtual nodes needed. No data structure in memory. Just math.
The function uses the key as a seed for a pseudorandom sequence and "jumps" through bucket assignments. Each jump has a decreasing probability of changing the assignment, and the math works out so that adding bucket N+1 only remaps 1/(N+1) of keys. It is optimal.
The catch: Jump Hash only supports adding or removing the last bucket. You can grow from 10 to 11 servers, but you cannot remove server 5 from a 10-server cluster. The bucket indices must be contiguous from 0 to N-1.
This makes Jump Hash ideal for sharded databases where shards only grow (you split shards but never remove them), for append-only distributed logs, and for systems where the server set is managed by a coordinator that can reassign logical shard IDs. It is a poor fit for clusters where arbitrary nodes can fail and be removed.
Jump Hash is also remarkably fast. No memory allocation, no data structure traversal. Just a loop with a few multiplications. Benchmarks typically show it at 3-5x faster than ring-based consistent hashing.
Rendezvous Hashing: Highest Random Weight
Rendezvous hashing (also called HRW) takes a completely different approach. For each key, compute hash(key, server) for every server in the cluster. Pick the server with the highest hash value. That is it.
Adding a server: the new server will win the highest hash for approximately 1/N of the keys, stealing them from their current owners. All other keys stay put. Removing a server: its keys redistribute to whichever remaining server has the next-highest hash for each key. This is optimal, just like ring-based consistent hashing.
The elegance is that there is no ring, no virtual nodes, no data structure. The algorithm is stateless. Two clients with the same server list will always agree on key placement without any coordination.
The cost is O(N) per lookup because you hash against every server. For 10 servers, that is nothing. For 100, still fine. For 10,000, the per-lookup cost starts to matter if you are doing millions of lookups per second. In practice, rendezvous hashing works well for DNS-based load balancing, CDN origin selection, and any system with fewer than a few hundred servers.
A nice property: rendezvous hashing handles weighted servers naturally. Instead of hash(key, server), use hash(key, server) * weight[server]^(1/hash). Higher-weight servers win more often, proportional to their weight.
Maglev Hashing: O(1) Lookups for Load Balancers
Google's Maglev paper (2016) describes a consistent hash designed for their network load balancers, where every packet needs a server assignment and per-packet latency matters.
The idea: build a lookup table of size M (a large prime, typically 65537). Each server populates entries in the table using a permutation derived from its hash. After construction, looking up a key is just table[hash(key) % M], which is O(1).
When a server is added or removed, the table is rebuilt. The key property is that the rebuild produces a table that differs from the old one in a minimal number of entries. Keys only move when strictly necessary.
The downside is that table construction is O(M * N), which can take milliseconds for large clusters. But construction happens only on membership changes, not per-lookup. For a load balancer processing millions of packets per second, amortizing a 5ms table rebuild over hours of O(1) lookups is a great trade.
Weighted Consistent Hashing
Real clusters have heterogeneous servers. A new server with 256GB RAM should handle more keys than the old 64GB machine sitting next to it. Two approaches:
Virtual node weighting. Give the 256GB machine 4x as many virtual nodes as the 64GB machine. It occupies more of the ring and absorbs more keys. Simple and effective, but changing weights requires removing and re-adding virtual nodes, which causes some key movement.
Bounded-load consistent hashing (Google, 2017). Set a maximum load per server (e.g., 1.25x the average). When the clockwise-nearest server is over its limit, the key continues clockwise to the next server. This prevents hotspots from popular keys overwhelming a single server. The bound is tunable: lower bounds give better balance but remap more keys on membership changes. A bound of 1 + epsilon where epsilon is small gives nearly perfect balance.
When NOT to Use Consistent Hashing
Consistent hashing has real overhead: the ring data structure, virtual node management, the binary search per lookup. Sometimes that overhead is not justified.
Batch processing with cold starts. If your MapReduce job starts fresh every time, cache locality does not matter. Use hash(key) % N for simplicity and perfect uniformity.
Small, stable server sets. If you have 3 database replicas that have not changed in two years, consistent hashing is solving a problem you do not have.
When you control the rebalancing. Systems like Kafka use partition assignment controlled by a coordinator. The coordinator can implement any rebalancing strategy it wants without needing the properties of consistent hashing.
When you want even distribution above all else. Modulo hashing and Jump Hash both give better uniformity than ring-based consistent hashing. If your server set changes infrequently and you can tolerate a full remap, the simpler options win.
The Implementation Details That Bite
Hash function choice matters. MD5 and SHA-1 are common in consistent hashing implementations because they scatter well, but they are slow. MurmurHash3 and xxHash give excellent distribution at 10x the speed. For consistent hashing, cryptographic strength is irrelevant. You care about uniformity, not collision resistance.
Replication on the ring. Many systems store keys on the N clockwise neighbors rather than just the first one. Cassandra does this: with a replication factor of 3, a key is stored on the next 3 distinct physical servers clockwise from the key's position. "Distinct physical" matters because virtual nodes can cause the next three ring positions to belong to the same physical server. Implementations must skip virtual nodes that map back to an already-selected physical server.
Hot keys break everything. Consistent hashing distributes keys evenly, but it cannot distribute load evenly if one key gets 1000x more traffic than others. A celebrity tweet cached on server A makes server A a hotspot regardless of how balanced the ring is. Solutions include local caching layers before the consistent hash, request rate limiting per key, or replicating hot keys to multiple servers.
Real World Behavior Under Failure
When a server fails, its keys shift to the next server clockwise on the ring. That server now handles its own load plus the failed server's load, roughly doubling its traffic. If the system is running at 60% capacity, the receiving server jumps to 120%, which may cascade.
This is why production systems combine consistent hashing with load monitoring. Cassandra uses consistent hashing for data placement but also tracks per-node load and can trigger streaming operations to rebalance when one node is overloaded. DynamoDB uses consistent hashing for partitioning but adds a control plane that can split hot partitions and move them to less-loaded nodes.
The consistent hashing paper describes an elegant algorithm. Running it in production requires wrapping it in operational tooling that handles the messy realities of uneven load, hot keys, and cascading failures.
Key Points
- •Adding or removing a server with modulo hashing (hash(key) % N) remaps nearly every key. With 100 servers, adding one server moves about 99% of your data. Consistent hashing remaps only 1/N of keys on average
- •Virtual nodes smooth out load distribution by giving each physical server 100-200 positions on the hash ring. Without them, servers can own wildly unequal segments due to hash clustering
- •Jump Hash from Google achieves perfect uniformity with zero memory overhead, but only supports append-only scaling. You can add bucket N+1 but you cannot remove bucket 5
- •Rendezvous hashing (highest random weight) needs no ring data structure at all. For each key, hash against every server and pick the highest. Clean and elegant, but O(N) per lookup limits it to smaller clusters
- •Maglev hashing builds a lookup table for O(1) queries after construction. Google uses it in their network load balancers where per-packet latency matters
Used By
Common Mistakes
- ✗Using too few virtual nodes. With 10 virtual nodes per server, load imbalance can exceed 2x between the most-loaded and least-loaded servers. At 150+ virtual nodes, variance drops below 10%
- ✗Giving identical virtual node counts to servers with different capacities. A 64GB machine and a 16GB machine should not own the same fraction of the ring
- ✗Reaching for consistent hashing when the server set is small and stable. With 5 servers that never change, modulo hashing is faster, simpler, and perfectly uniform
- ✗Not monitoring actual key distribution after deployment. Hash functions can cluster on real-world key patterns even when they look uniform on random inputs