Two Heaps (Median Finder)
The Intuition
Imagine lining up your entire class by height. The median kid is the one standing right in the middle. But re-sorting the entire line every time a new student walks in is expensive. You need a shortcut.
Split the class into two groups. Short kids go to the left group, tall kids go to the right group. In the left group, you only care about the tallest short kid (they stand at the front). In the right group, you only care about the shortest tall kid (they also stand at the front). The median is always one of these two "front" kids, or their average. You never need to look deeper into either group.
A max-heap is the left group. It bubbles the largest value to the top, so the tallest short kid is always instantly accessible. A min-heap is the right group. It bubbles the smallest value to the top, so the shortest tall kid is always instantly accessible. Peeking at both tops gives you the median in O(1).
When a new student arrives, put them in the left group first. If the tallest short kid is now taller than the shortest tall kid, that student is in the wrong group, so move them over. Then check if the groups are balanced: they should differ in size by at most 1. If one group got too big, move its front person to the other group. After these two checks, the invariant is restored: left holds the smaller half, right holds the larger half, and the tops give you the median.
Finding the median in a data stream, maintaining the median as elements are added or removed, sliding window median problems, or any scenario where you need quick access to the middle element of a growing collection.
Need arbitrary element access (use balanced BST), data is static and you just need one median (sort and pick middle), or need more than just the median (use different statistics structure).
Variations:
- Basic median finder: Add numbers one at a time, query the median at any point.
- Sliding window median: Combine two heaps with lazy deletion using a hash map to track invalidated entries.
- Weighted median: Attach weights to elements and balance by total weight rather than count.
- Two-group optimization: Some problems (like IPO) use a min-heap and max-heap not for median but to greedily select from two partitions.
- Only one element has been added, so the median is that element itself
- Two elements added, median is their average
- All elements are identical, both heaps contain the same value
- Integer overflow when summing two heap tops for the average (use long or double)
- Empty heaps when querying median before any insertions
- Negative numbers mixed with positive numbers
Key Points
- •Max-heap holds the smaller half, min-heap holds the larger half
- •Balance the heaps so their sizes differ by at most 1
- •The median is the top of the max-heap or the average of both tops
- •Always add to max-heap first, then rebalance by moving tops between heaps
- •Python uses negation to simulate a max-heap with heapq
Code Template
1 import heapq
2
3 class MedianFinder:
4 """Two-heap median finder.
5 max_heap: smaller half (negated for max-heap behavior).
6 min_heap: larger half."""
7
8 def __init__(self):
9 self.max_heap = [] # stores negated values
10 self.min_heap = []
11
12 def add_num(self, num):
13 # Push to max-heap (smaller half)
14 heapq.heappush(self.max_heap, -num)
15
16 # Ensure max-heap top <= min-heap top
17 if (self.min_heap and
18 -self.max_heap[0] > self.min_heap[0]):
19 val = -heapq.heappop(self.max_heap)
20 heapq.heappush(self.min_heap, val)
21
22 # Balance sizes: max_heap can have at most 1 extra
23 if len(self.max_heap) > len(self.min_heap) + 1:
24 val = -heapq.heappop(self.max_heap)
25 heapq.heappush(self.min_heap, val)
26 elif len(self.min_heap) > len(self.max_heap):
27 val = heapq.heappop(self.min_heap)
28 heapq.heappush(self.max_heap, -val)
29
30 def find_median(self):
31 if len(self.max_heap) > len(self.min_heap):
32 return -self.max_heap[0]
33 return (-self.max_heap[0] + self.min_heap[0]) / 2.0Common Mistakes
- Forgetting to negate values when using Python heapq as a max-heap
- Not rebalancing heaps after every insertion
- Getting the median wrong when both heaps have equal size (must average the two tops)
- Using the wrong heap for initial insertion, breaking the ordering invariant