Sharded HashMap

Why sharding

A synchronized HashMap puts one lock around every operation. Every get, every put, every remove, on every key, takes the same lock. At low traffic this is fine. At a hundred thousand operations per second across many threads, the lock itself becomes the bottleneck. Threads queue on it, the kernel parks and wakes them, and most of the CPU is spent waiting rather than working.

Sharding splits the one map into N smaller maps, each with its own lock. A key picks its shard by hashing the key and taking the result modulo N. Two operations on different shards never touch the same lock, so they run in parallel. Two operations on the same shard still serialise, but only against each other, not against every other operation in the map.

The contrast in pictures.

In the top diagram, all three threads hit the same lock and the writes serialise. In the bottom diagram, each thread's key hashes to a different shard, the three locks are independent, and the three writes run truly in parallel. Shard 1 happens to be empty in this snapshot; that is fine, an empty shard is just an unused lock and an empty map.

The underlying idea is called lock striping: if two operations do not touch the same key, they should not share a lock. The pattern applies far beyond hash maps. Sharded counters, sharded caches, partitioned queues, anything that can be partitioned by a key can be lock-striped the same way.

How to choose N

The number of shards is a tuning parameter, not a fundamental property. Picking it well is mostly common sense.

N	Tradeoff
1	No sharding, the single global lock is back
16	Java's `ConcurrentHashMap` default. Reasonable for most workloads.
32 to 64	Most production caches. Fits the "tens of writer threads" range comfortably.
128 or more	Diminishing returns. The locks themselves start wasting cache lines and the indirection cost grows.

Two practical rules:

N should be at least the expected concurrent thread count. Otherwise threads still serialise.
Make N a power of two. Then the shard index is hash & (N - 1), a single AND instruction. Modulo on a non-power-of-two takes several cycles. At cache-hot rates this matters.

The hot-key problem

Sharding helps when the load is uniformly distributed across keys. If most operations target one specific key (everyone is reading user_id = 1), every one of those operations hashes to the same shard and contends on that shard's lock. The other shards sit idle. The sharded map gives no win in this case; the hot key has just become the new bottleneck.

The fix is not "more shards". The hot-key bottleneck is structural and needs a different design:

Replicate the hot key. Each thread or each region holds its own copy. Reads are local. Writes fan out (rare) or use eventual consistency.
Read-through cache in front of the shard. Each thread has a small per-thread cache that absorbs reads of the hot key without touching the shared map.
Use an atomic primitive for the hot key. If the value is a counter or a small piece of state, an AtomicLong (or atomic.Int64) is faster than any lock-based shard.

The general principle: sharding solves "too many threads hitting one lock for unrelated keys". It does not solve "too many threads hitting one key".

Java 8+ ConcurrentHashMap goes further

Java 8 abandoned shard-level locking entirely in favour of bin-level CAS. Each bucket in the hash table is a small chain (or, when collisions get bad, a tree). Each bucket has its own lock for chains, and for short chains the head pointer is updated with a plain CAS rather than acquiring a lock at all. Resize is lock-free and incremental: the table grows in chunks while concurrent operations migrate entries gradually.

The effect is finer-grained parallelism than any fixed sharding scheme. Two writers that hit different buckets do not contend, even if those buckets are in the same "shard" by any 16-shard partitioning. The API stays Map, so application code does not change.

For production Java code, do not hand-roll a sharded map. ConcurrentHashMap is faster than any fixed N-shard implementation in almost every workload, and it is already in the standard library. Roll a custom sharded design only when there is a real reason the standard map cannot be used: custom eviction policy, per-shard observability, deterministic memory layout, or similar specialised requirements.

Tip

The interview answer "Sharded map: hash the key, pick a shard, lock only that shard. Sixteen to sixty-four shards is the usual range, power of two for fast indexing. ConcurrentHashMap does this better with per-bucket CAS, so for production Java the answer is to use ConcurrentHashMap directly. The sharded design is the right answer when the question is about how concurrent maps work or when the standard map cannot be used for some specific reason."

Why sharding

The contrast in pictures.

How to choose N

The number of shards is a tuning parameter, not a fundamental property. Picking it well is mostly common sense.

N	Tradeoff
1	No sharding, the single global lock is back
16	Java's `ConcurrentHashMap` default. Reasonable for most workloads.
32 to 64	Most production caches. Fits the "tens of writer threads" range comfortably.
128 or more	Diminishing returns. The locks themselves start wasting cache lines and the indirection cost grows.

Two practical rules:

N should be at least the expected concurrent thread count. Otherwise threads still serialise.
Make N a power of two. Then the shard index is hash & (N - 1), a single AND instruction. Modulo on a non-power-of-two takes several cycles. At cache-hot rates this matters.

The hot-key problem

The fix is not "more shards". The hot-key bottleneck is structural and needs a different design:

Replicate the hot key. Each thread or each region holds its own copy. Reads are local. Writes fan out (rare) or use eventual consistency.
Read-through cache in front of the shard. Each thread has a small per-thread cache that absorbs reads of the hot key without touching the shared map.
Use an atomic primitive for the hot key. If the value is a counter or a small piece of state, an AtomicLong (or atomic.Int64) is faster than any lock-based shard.

The general principle: sharding solves "too many threads hitting one lock for unrelated keys". It does not solve "too many threads hitting one key".

Java 8+ ConcurrentHashMap goes further

Tip

Why sharding

How to choose N

The hot-key problem

Java 8+ ConcurrentHashMap goes further

Implementations

Key points

Follow-up questions

Gotchas

Related reading

Sharded HashMap

Why sharding

How to choose N

The hot-key problem

Java 8+ ConcurrentHashMap goes further

Implementations

Key points

Follow-up questions

Gotchas

Related reading