Two Heaps (Median Finder)
The Intuition
The two-heap pattern maintains the median of a dynamic collection in O(log n) per insertion and O(1) per query. The dataset is split into two halves:
- Lower half (max-heap,
lo). The largest value in this heap is at its root and is always accessible inO(1). - Upper half (min-heap,
hi). The smallest value in this heap is at its root.
Two invariants must hold after every operation:
- Order: every value in
lois<=every value inhi. - Balance:
len(lo) == len(hi)orlen(lo) == len(hi) + 1. The lower half is allowed to have at most one extra element.
Given those invariants, the median is lo[0] when sizes differ, or (lo[0] + hi[0]) / 2 when they are equal.
The standard insertion algorithm in three lines:
- Push the new value to
lo. - Move
lo's root tohi(hi.push(lo.pop())). This guarantees the order invariant. - If
hiis now larger thanlo, movehi's root back tolo. This guarantees the balance invariant.
The pattern generalizes beyond medians whenever a problem requires "smallest k values vs largest k values" or "below threshold vs above threshold" with dynamic balancing. Two-heap solutions show up for IPO (LC 502), where one heap holds projects ordered by capital required and the other holds available projects ordered by profit, and for sliding-window median (LC 480), where lazy deletion via a hash map handles values that have left the window.
Finding the median in a data stream, maintaining the median as elements are added or removed, sliding window median problems, or any scenario where you need quick access to the middle element of a growing collection.
Need arbitrary element access (use balanced BST), data is static and you just need one median (sort and pick middle), or need more than just the median (use different statistics structure).
- Only one element has been added, so the median is that element itself
- Two elements added, median is their average
- All elements are identical, both heaps contain the same value
- Integer overflow when summing two heap tops for the average (use long or double)
- Empty heaps when querying median before any insertions
- Negative numbers mixed with positive numbers
Key Points
- •Max-heap holds the smaller half, min-heap holds the larger half
- •Balance the heaps so their sizes differ by at most 1
- •The median is the top of the max-heap or the average of both tops
- •Always add to max-heap first, then rebalance by moving tops between heaps
- •Python uses negation to simulate a max-heap with heapq
Code Template
import heapq
class MedianFinder:
"""Two-heap median finder.
max_heap: smaller half (negated for max-heap behavior).
min_heap: larger half."""
def __init__(self):
self.max_heap = [] # stores negated values
self.min_heap = []
def add_num(self, num):
# Push to max-heap (smaller half)
heapq.heappush(self.max_heap, -num)
# Ensure max-heap top <= min-heap top
if (self.min_heap and
-self.max_heap[0] > self.min_heap[0]):
val = -heapq.heappop(self.max_heap)
heapq.heappush(self.min_heap, val)
# Balance sizes: max_heap can have at most 1 extra
if len(self.max_heap) > len(self.min_heap) + 1:
val = -heapq.heappop(self.max_heap)
heapq.heappush(self.min_heap, val)
elif len(self.min_heap) > len(self.max_heap):
val = heapq.heappop(self.min_heap)
heapq.heappush(self.max_heap, -val)
def find_median(self):
if len(self.max_heap) > len(self.min_heap):
return -self.max_heap[0]
return (-self.max_heap[0] + self.min_heap[0]) / 2.0Common Mistakes
- Forgetting to negate values when using Python heapq as a max-heap
- Not rebalancing heaps after every insertion
- Getting the median wrong when both heaps have equal size (must average the two tops)
- Using the wrong heap for initial insertion, breaking the ordering invariant