Sharded HashMap
Naive synchronized HashMap serializes every operation. Sharded version splits into N independent maps, each with its own lock. Hash key → shard index → operate on that shard's map. Java's ConcurrentHashMap does this with 16+ shards. Reads parallel, writes serialized only within shard.
Why sharding
A synchronized HashMap puts one lock around every operation. Every get, every put, every remove, on every key, takes the same lock. At low traffic this is fine. At a hundred thousand operations per second across many threads, the lock itself becomes the bottleneck. Threads queue on it, the kernel parks and wakes them, and most of the CPU is spent waiting rather than working.
Sharding splits the one map into N smaller maps, each with its own lock. A key picks its shard by hashing the key and taking the result modulo N. Two operations on different shards never touch the same lock, so they run in parallel. Two operations on the same shard still serialise, but only against each other, not against every other operation in the map.
The contrast in pictures.
In the top diagram, all three threads hit the same lock and the writes serialise. In the bottom diagram, each thread's key hashes to a different shard, the three locks are independent, and the three writes run truly in parallel. Shard 1 happens to be empty in this snapshot; that is fine, an empty shard is just an unused lock and an empty map.
The underlying idea is called lock striping: if two operations do not touch the same key, they should not share a lock. The pattern applies far beyond hash maps. Sharded counters, sharded caches, partitioned queues, anything that can be partitioned by a key can be lock-striped the same way.
How to choose N
The number of shards is a tuning parameter, not a fundamental property. Picking it well is mostly common sense.
| N | Tradeoff |
|---|---|
| 1 | No sharding, the single global lock is back |
| 16 | Java's ConcurrentHashMap default. Reasonable for most workloads. |
| 32 to 64 | Most production caches. Fits the "tens of writer threads" range comfortably. |
| 128 or more | Diminishing returns. The locks themselves start wasting cache lines and the indirection cost grows. |
Two practical rules:
- N should be at least the expected concurrent thread count. Otherwise threads still serialise.
- Make N a power of two. Then the shard index is
hash & (N - 1), a single AND instruction. Modulo on a non-power-of-two takes several cycles. At cache-hot rates this matters.
The hot-key problem
Sharding helps when the load is uniformly distributed across keys. If most operations target one specific key (everyone is reading user_id = 1), every one of those operations hashes to the same shard and contends on that shard's lock. The other shards sit idle. The sharded map gives no win in this case; the hot key has just become the new bottleneck.
The fix is not "more shards". The hot-key bottleneck is structural and needs a different design:
- Replicate the hot key. Each thread or each region holds its own copy. Reads are local. Writes fan out (rare) or use eventual consistency.
- Read-through cache in front of the shard. Each thread has a small per-thread cache that absorbs reads of the hot key without touching the shared map.
- Use an atomic primitive for the hot key. If the value is a counter or a small piece of state, an
AtomicLong(oratomic.Int64) is faster than any lock-based shard.
The general principle: sharding solves "too many threads hitting one lock for unrelated keys". It does not solve "too many threads hitting one key".
Java 8+ ConcurrentHashMap goes further
Java 8 abandoned shard-level locking entirely in favour of bin-level CAS. Each bucket in the hash table is a small chain (or, when collisions get bad, a tree). Each bucket has its own lock for chains, and for short chains the head pointer is updated with a plain CAS rather than acquiring a lock at all. Resize is lock-free and incremental: the table grows in chunks while concurrent operations migrate entries gradually.
The effect is finer-grained parallelism than any fixed sharding scheme. Two writers that hit different buckets do not contend, even if those buckets are in the same "shard" by any 16-shard partitioning. The API stays Map, so application code does not change.
For production Java code, do not hand-roll a sharded map. ConcurrentHashMap is faster than any fixed N-shard implementation in almost every workload, and it is already in the standard library. Roll a custom sharded design only when there is a real reason the standard map cannot be used: custom eviction policy, per-shard observability, deterministic memory layout, or similar specialised requirements.
The interview answer
"Sharded map: hash the key, pick a shard, lock only that shard. Sixteen to sixty-four shards is the usual range, power of two for fast indexing. ConcurrentHashMap does this better with per-bucket CAS, so for production Java the answer is to use ConcurrentHashMap directly. The sharded design is the right answer when the question is about how concurrent maps work or when the standard map cannot be used for some specific reason."
Implementations
N independent shards, each is a (lock, map) pair. Key hash chooses the shard. Operations on different shards run in parallel.
1 class ShardedMap<K, V> {
2 private final Shard<K, V>[] shards;
3 private final int mask;
4
5 @SuppressWarnings("unchecked")
6 public ShardedMap(int shardCount) {
7 int n = Integer.highestOneBit(shardCount - 1) << 1; // power of 2
8 this.mask = n - 1;
9 this.shards = new Shard[n];
10 for (int i = 0; i < n; i++) shards[i] = new Shard<>();
11 }
12
13 private Shard<K, V> shardFor(K key) {
14 int h = key.hashCode();
15 return shards[(h ^ (h >>> 16)) & mask]; // mix high bits
16 }
17
18 public V get(K key) {
19 Shard<K, V> s = shardFor(key);
20 s.lock.readLock().lock();
21 try { return s.map.get(key); }
22 finally { s.lock.readLock().unlock(); }
23 }
24
25 public V put(K key, V value) {
26 Shard<K, V> s = shardFor(key);
27 s.lock.writeLock().lock();
28 try { return s.map.put(key, value); }
29 finally { s.lock.writeLock().unlock(); }
30 }
31
32 static class Shard<K, V> {
33 final ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
34 final Map<K, V> map = new HashMap<>();
35 }
36 }Key points
- •N independent shards, each with own lock + map
- •Hash key → shard index = key.hashCode() % N (better: use bit masking with power-of-2 N)
- •Concurrent reads/writes on different shards proceed in parallel
- •Resize: per-shard, or global (rare); ConcurrentHashMap uses tree-based bins instead
- •Java 8+ ConcurrentHashMap uses CAS + per-bin synchronized, even finer than striping
Follow-up questions
▸How many shards is right?
▸Why is ConcurrentHashMap better than basic sharding?
▸When does sharding NOT help?
Gotchas
- !Bad hash function → uneven shard distribution → some shards always hot
- !Iterating across shards is non-atomic, snapshot is inconsistent
- !size() across shards requires summing each, non-atomic without freeze
- !Resize is per-shard, not global, capacity can drift unevenly
- !Don't use string %= shardCount on user-provided keys, collision attacks