FoundationalProblem 3 of 20

FoundationalStarter

HashMap

Strategy PatternIterator Pattern

An array of buckets, a hash function, and linked lists for collisions. Simple pieces that combine into the most used data structure in software engineering.

Key Abstractions

HashMap

Public API for put, get, remove. Manages capacity and triggers resizing.

Entry

Key-value-hash triple. Stores the precomputed hash to avoid rehashing during resize.

HashFunction

Strategy interface for computing hash codes. Swappable for different key types.

Class Diagram

How It Works

Every HashMap boils down to the same three-step dance: hash the key, pick a bucket, and handle collisions. You take a key, run it through a hash function to get an integer, then map that integer to an array index. When two keys land on the same index (and they will), you need a collision resolution strategy.

We use separate chaining here. Each bucket holds a linked list of entries. When a collision happens, the new entry gets prepended to the list at that index. It sounds like it could get slow, but as long as the load factor stays below 0.75, most chains have zero or one entry. Average case stays O(1).

Requirements

Functional

put(key, value) inserts a new entry or updates an existing one
get(key) returns the associated value, or null/None if the key doesn't exist
remove(key) deletes the entry and returns whether it existed
Automatically resizes when the number of entries exceeds the load factor threshold

Non-Functional

O(1) amortized time for put, get, and remove
Thread-safe for concurrent access
Hash function should be swappable via Strategy pattern

Design Decisions

Why separate chaining over open addressing?

Open addressing (linear probing, double hashing) stores everything in the main array. Sounds cache-friendly, and it is. But deletions become painful. You need tombstone markers, and over time those tombstones degrade performance unless you periodically rehash. Separate chaining handles deletions cleanly since you just unlink a node. It also degrades more gracefully under high load. When chains grow, performance drops linearly. Open addressing can hit clustering cascades where a single hot region tanks the whole table.

Why power-of-2 capacity?

With an arbitrary capacity, you compute the bucket index using modulo: hash % capacity. Division is slow. With power-of-2 capacity, you replace it with a bitmask: hash & (capacity - 1). Same result, much faster. The tradeoff is that poor hash functions with patterns in the low bits will cluster more. That's why we include bit-spreading in the hash function, mixing high bits into low bits to counteract this.

Why rehash at 0.75 load factor?

It's a balance between space and time. At 0.5 you waste half your memory. At 1.0 you have an average chain length of 1 but worst-case chains get long enough to hurt. 0.75 is the sweet spot that Java's HashMap settled on after benchmarking, and it works well in practice. Most buckets hold 0 or 1 entries, and resize frequency stays reasonable.

Why store the hash code in each Entry?

During resize, every single entry needs to be re-indexed into the new bucket array. If you don't store the hash, you have to recompute it for every entry. For objects with expensive hashCode() implementations (like long strings), that adds up fast. One extra int per entry is a small price for avoiding redundant computation during resize.

Interview Follow-ups

"How would you handle a hash-flooding attack?" Use a randomized hash function or switch to a balanced tree for buckets once chain length exceeds a threshold (this is what Java 8+ does at length 8). Randomization prevents attackers from crafting inputs that all hash to the same bucket.
"What about concurrent reads without locking?" You could use a lock-free approach with volatile reads on the bucket array and CAS operations for writes. Java's ConcurrentHashMap uses segment-level locking and lock-free reads as a middle ground.
"How would you implement iteration?" Walk the bucket array left to right, follow each chain to completion. Modification during iteration is the hard part. You either snapshot or use a fail-fast counter that detects structural changes.
"Why not use a balanced BST per bucket instead of a linked list?" For most workloads, chains are short enough that linked list traversal is faster due to lower overhead. But if you expect adversarial inputs, trees cap worst-case lookup at O(log n) per bucket instead of O(n).

Code Implementation

  1  from __future__ import annotations
  2  from abc import ABC, abstractmethod
  3  from threading import Lock
  4  from typing import Any, Iterator
  5  
  6  
  7  class HashFunction(ABC):
  8      """Strategy interface for hashing keys."""
  9  
 10      @abstractmethod
 11      def hash(self, key: Any) -> int: ...
 12  
 13  
 14  class DefaultHashFunction(HashFunction):
 15      """Uses Python's built-in hash with bit spreading to reduce clustering."""
 16  
 17      def hash(self, key: Any) -> int:
 18          h = hash(key)
 19          # Spread high bits into low bits (similar to Java's HashMap)
 20          return h ^ (h >> 16)
 21  
 22  
 23  class Entry:
 24      """Key-value-hash triple. Linked via next pointer for chaining."""
 25  
 26      __slots__ = ("key", "value", "hash_code", "next")
 27  
 28      def __init__(self, key: Any, value: Any, hash_code: int):
 29          self.key = key
 30          self.value = value
 31          self.hash_code = hash_code
 32          self.next: "Entry | None" = None
 33  
 34  
 35  class HashMap:
 36      """Hash map with separate chaining, dynamic resizing, and pluggable hash function."""
 37  
 38      _DEFAULT_CAPACITY = 16
 39      _LOAD_FACTOR = 0.75
 40  
 41      def __init__(self, capacity: int = _DEFAULT_CAPACITY, hash_fn: HashFunction | None = None):
 42          # Force capacity to nearest power of 2
 43          self._capacity = self._next_power_of_two(capacity)
 44          self._size = 0
 45          self._buckets: list[Entry | None] = [None] * self._capacity
 46          self._hash_fn = hash_fn or DefaultHashFunction()
 47          self._lock = Lock()
 48  
 49      @staticmethod
 50      def _next_power_of_two(n: int) -> int:
 51          if n <= 1:
 52              return 1
 53          n -= 1
 54          n |= n >> 1
 55          n |= n >> 2
 56          n |= n >> 4
 57          n |= n >> 8
 58          n |= n >> 16
 59          return n + 1
 60  
 61      def _index(self, hash_code: int) -> int:
 62          return hash_code & (self._capacity - 1)
 63  
 64      def put(self, key: Any, value: Any) -> None:
 65          with self._lock:
 66              h = self._hash_fn.hash(key)
 67              idx = self._index(h)
 68              entry = self._buckets[idx]
 69  
 70              while entry is not None:
 71                  if entry.hash_code == h and entry.key == key:
 72                      entry.value = value
 73                      return
 74                  entry = entry.next
 75  
 76              new_entry = Entry(key, value, h)
 77              new_entry.next = self._buckets[idx]
 78              self._buckets[idx] = new_entry
 79              self._size += 1
 80  
 81              if self._size > self._capacity * self._LOAD_FACTOR:
 82                  self._resize()
 83  
 84      def get(self, key: Any) -> Any | None:
 85          with self._lock:
 86              h = self._hash_fn.hash(key)
 87              idx = self._index(h)
 88              entry = self._buckets[idx]
 89  
 90              while entry is not None:
 91                  if entry.hash_code == h and entry.key == key:
 92                      return entry.value
 93                  entry = entry.next
 94              return None
 95  
 96      def remove(self, key: Any) -> bool:
 97          with self._lock:
 98              h = self._hash_fn.hash(key)
 99              idx = self._index(h)
100              entry = self._buckets[idx]
101              prev = None
102  
103              while entry is not None:
104                  if entry.hash_code == h and entry.key == key:
105                      if prev is None:
106                          self._buckets[idx] = entry.next
107                      else:
108                          prev.next = entry.next
109                      self._size -= 1
110                      return True
111                  prev = entry
112                  entry = entry.next
113              return False
114  
115      def _resize(self) -> None:
116          old_buckets = self._buckets
117          self._capacity *= 2
118          self._buckets = [None] * self._capacity
119          self._size = 0
120  
121          for head in old_buckets:
122              entry = head
123              while entry is not None:
124                  self.put.__wrapped__(self, entry.key, entry.value) if hasattr(self.put, '__wrapped__') else self._put_internal(entry.key, entry.value, entry.hash_code)
125                  entry = entry.next
126  
127      def _put_internal(self, key: Any, value: Any, hash_code: int) -> None:
128          idx = self._index(hash_code)
129          new_entry = Entry(key, value, hash_code)
130          new_entry.next = self._buckets[idx]
131          self._buckets[idx] = new_entry
132          self._size += 1
133  
134      def __len__(self) -> int:
135          return self._size
136  
137      def __contains__(self, key: Any) -> bool:
138          return self.get(key) is not None
139  
140      def keys(self) -> Iterator[Any]:
141          for head in self._buckets:
142              entry = head
143              while entry is not None:
144                  yield entry.key
145                  entry = entry.next
146  
147  
148  if __name__ == "__main__":
149      hm = HashMap(capacity=4)
150  
151      # Basic put and get
152      hm.put("name", "Alice")
153      hm.put("age", 30)
154      hm.put("city", "Seattle")
155      print(f"get('name') = {hm.get('name')}")   # Alice
156      print(f"get('age') = {hm.get('age')}")      # 30
157      print(f"size = {len(hm)}")                   # 3
158  
159      # Update existing key
160      hm.put("age", 31)
161      print(f"get('age') after update = {hm.get('age')}")  # 31
162  
163      # Force collisions and resize by adding more entries
164      for i in range(20):
165          hm.put(f"key_{i}", i)
166      print(f"size after bulk insert = {len(hm)}")  # 23
167      print(f"get('key_15') = {hm.get('key_15')}")  # 15
168  
169      # Remove
170      removed = hm.remove("name")
171      print(f"remove('name') = {removed}")           # True
172      print(f"get('name') after remove = {hm.get('name')}")  # None
173  
174      # Contains
175      print(f"'city' in map = {'city' in hm}")       # True
176      print(f"'name' in map = {'name' in hm}")       # False
177  
178      print("All operations completed successfully.")

Common Mistakes

✗Using key equality without checking hash first. Compare hashes before calling equals() to short-circuit expensive comparisons.
✗Forgetting to handle null keys or duplicate keys on put(). Duplicates should update the value, not create a second entry.
✗Not resizing. Without resize, chains grow linearly and your O(1) map degrades to O(n).
✗Rehashing with the old capacity instead of the new one. Every entry must be re-indexed against the new size.

Key Points

✓Separate chaining handles collisions by appending to a linked list at each bucket index.
✓Power-of-2 capacity lets you replace modulo with bitwise AND. index = hash & (capacity - 1) is faster.
✓Rehashing at 0.75 load factor keeps average chain length short, preserving O(1) amortized access.
✓Storing the hash in each Entry avoids recomputing it during resize. Small cost, big win at scale.

Related Designs

FoundationalProblem 3 of 20

FoundationalStarter

HashMap

Strategy PatternIterator Pattern

An array of buckets, a hash function, and linked lists for collisions. Simple pieces that combine into the most used data structure in software engineering.

Key Abstractions

HashMap

Public API for put, get, remove. Manages capacity and triggers resizing.

Entry

Key-value-hash triple. Stores the precomputed hash to avoid rehashing during resize.

HashFunction

Strategy interface for computing hash codes. Swappable for different key types.

Class Diagram

How It Works

Requirements

Functional

put(key, value) inserts a new entry or updates an existing one
get(key) returns the associated value, or null/None if the key doesn't exist
remove(key) deletes the entry and returns whether it existed
Automatically resizes when the number of entries exceeds the load factor threshold

Non-Functional

O(1) amortized time for put, get, and remove
Thread-safe for concurrent access
Hash function should be swappable via Strategy pattern

Design Decisions

Why separate chaining over open addressing?

Why power-of-2 capacity?

Why rehash at 0.75 load factor?

Why store the hash code in each Entry?

Interview Follow-ups

"How would you handle a hash-flooding attack?" Use a randomized hash function or switch to a balanced tree for buckets once chain length exceeds a threshold (this is what Java 8+ does at length 8). Randomization prevents attackers from crafting inputs that all hash to the same bucket.
"What about concurrent reads without locking?" You could use a lock-free approach with volatile reads on the bucket array and CAS operations for writes. Java's ConcurrentHashMap uses segment-level locking and lock-free reads as a middle ground.
"How would you implement iteration?" Walk the bucket array left to right, follow each chain to completion. Modification during iteration is the hard part. You either snapshot or use a fail-fast counter that detects structural changes.
"Why not use a balanced BST per bucket instead of a linked list?" For most workloads, chains are short enough that linked list traversal is faster due to lower overhead. But if you expect adversarial inputs, trees cap worst-case lookup at O(log n) per bucket instead of O(n).

Code Implementation

  1  from __future__ import annotations
  2  from abc import ABC, abstractmethod
  3  from threading import Lock
  4  from typing import Any, Iterator
  5  
  6  
  7  class HashFunction(ABC):
  8      """Strategy interface for hashing keys."""
  9  
 10      @abstractmethod
 11      def hash(self, key: Any) -> int: ...
 12  
 13  
 14  class DefaultHashFunction(HashFunction):
 15      """Uses Python's built-in hash with bit spreading to reduce clustering."""
 16  
 17      def hash(self, key: Any) -> int:
 18          h = hash(key)
 19          # Spread high bits into low bits (similar to Java's HashMap)
 20          return h ^ (h >> 16)
 21  
 22  
 23  class Entry:
 24      """Key-value-hash triple. Linked via next pointer for chaining."""
 25  
 26      __slots__ = ("key", "value", "hash_code", "next")
 27  
 28      def __init__(self, key: Any, value: Any, hash_code: int):
 29          self.key = key
 30          self.value = value
 31          self.hash_code = hash_code
 32          self.next: "Entry | None" = None
 33  
 34  
 35  class HashMap:
 36      """Hash map with separate chaining, dynamic resizing, and pluggable hash function."""
 37  
 38      _DEFAULT_CAPACITY = 16
 39      _LOAD_FACTOR = 0.75
 40  
 41      def __init__(self, capacity: int = _DEFAULT_CAPACITY, hash_fn: HashFunction | None = None):
 42          # Force capacity to nearest power of 2
 43          self._capacity = self._next_power_of_two(capacity)
 44          self._size = 0
 45          self._buckets: list[Entry | None] = [None] * self._capacity
 46          self._hash_fn = hash_fn or DefaultHashFunction()
 47          self._lock = Lock()
 48  
 49      @staticmethod
 50      def _next_power_of_two(n: int) -> int:
 51          if n <= 1:
 52              return 1
 53          n -= 1
 54          n |= n >> 1
 55          n |= n >> 2
 56          n |= n >> 4
 57          n |= n >> 8
 58          n |= n >> 16
 59          return n + 1
 60  
 61      def _index(self, hash_code: int) -> int:
 62          return hash_code & (self._capacity - 1)
 63  
 64      def put(self, key: Any, value: Any) -> None:
 65          with self._lock:
 66              h = self._hash_fn.hash(key)
 67              idx = self._index(h)
 68              entry = self._buckets[idx]
 69  
 70              while entry is not None:
 71                  if entry.hash_code == h and entry.key == key:
 72                      entry.value = value
 73                      return
 74                  entry = entry.next
 75  
 76              new_entry = Entry(key, value, h)
 77              new_entry.next = self._buckets[idx]
 78              self._buckets[idx] = new_entry
 79              self._size += 1
 80  
 81              if self._size > self._capacity * self._LOAD_FACTOR:
 82                  self._resize()
 83  
 84      def get(self, key: Any) -> Any | None:
 85          with self._lock:
 86              h = self._hash_fn.hash(key)
 87              idx = self._index(h)
 88              entry = self._buckets[idx]
 89  
 90              while entry is not None:
 91                  if entry.hash_code == h and entry.key == key:
 92                      return entry.value
 93                  entry = entry.next
 94              return None
 95  
 96      def remove(self, key: Any) -> bool:
 97          with self._lock:
 98              h = self._hash_fn.hash(key)
 99              idx = self._index(h)
100              entry = self._buckets[idx]
101              prev = None
102  
103              while entry is not None:
104                  if entry.hash_code == h and entry.key == key:
105                      if prev is None:
106                          self._buckets[idx] = entry.next
107                      else:
108                          prev.next = entry.next
109                      self._size -= 1
110                      return True
111                  prev = entry
112                  entry = entry.next
113              return False
114  
115      def _resize(self) -> None:
116          old_buckets = self._buckets
117          self._capacity *= 2
118          self._buckets = [None] * self._capacity
119          self._size = 0
120  
121          for head in old_buckets:
122              entry = head
123              while entry is not None:
124                  self.put.__wrapped__(self, entry.key, entry.value) if hasattr(self.put, '__wrapped__') else self._put_internal(entry.key, entry.value, entry.hash_code)
125                  entry = entry.next
126  
127      def _put_internal(self, key: Any, value: Any, hash_code: int) -> None:
128          idx = self._index(hash_code)
129          new_entry = Entry(key, value, hash_code)
130          new_entry.next = self._buckets[idx]
131          self._buckets[idx] = new_entry
132          self._size += 1
133  
134      def __len__(self) -> int:
135          return self._size
136  
137      def __contains__(self, key: Any) -> bool:
138          return self.get(key) is not None
139  
140      def keys(self) -> Iterator[Any]:
141          for head in self._buckets:
142              entry = head
143              while entry is not None:
144                  yield entry.key
145                  entry = entry.next
146  
147  
148  if __name__ == "__main__":
149      hm = HashMap(capacity=4)
150  
151      # Basic put and get
152      hm.put("name", "Alice")
153      hm.put("age", 30)
154      hm.put("city", "Seattle")
155      print(f"get('name') = {hm.get('name')}")   # Alice
156      print(f"get('age') = {hm.get('age')}")      # 30
157      print(f"size = {len(hm)}")                   # 3
158  
159      # Update existing key
160      hm.put("age", 31)
161      print(f"get('age') after update = {hm.get('age')}")  # 31
162  
163      # Force collisions and resize by adding more entries
164      for i in range(20):
165          hm.put(f"key_{i}", i)
166      print(f"size after bulk insert = {len(hm)}")  # 23
167      print(f"get('key_15') = {hm.get('key_15')}")  # 15
168  
169      # Remove
170      removed = hm.remove("name")
171      print(f"remove('name') = {removed}")           # True
172      print(f"get('name') after remove = {hm.get('name')}")  # None
173  
174      # Contains
175      print(f"'city' in map = {'city' in hm}")       # True
176      print(f"'name' in map = {'name' in hm}")       # False
177  
178      print("All operations completed successfully.")

Common Mistakes

✗Using key equality without checking hash first. Compare hashes before calling equals() to short-circuit expensive comparisons.
✗Forgetting to handle null keys or duplicate keys on put(). Duplicates should update the value, not create a second entry.
✗Not resizing. Without resize, chains grow linearly and your O(1) map degrades to O(n).
✗Rehashing with the old capacity instead of the new one. Every entry must be re-indexed against the new size.

Key Points

✓Separate chaining handles collisions by appending to a linked list at each bucket index.
✓Power-of-2 capacity lets you replace modulo with bitwise AND. index = hash & (capacity - 1) is faster.
✓Rehashing at 0.75 load factor keeps average chain length short, preserving O(1) amortized access.
✓Storing the hash in each Entry avoids recomputing it during resize. Small cost, big win at scale.