Heap / Top-K Pattern
The Intuition
Imagine you're a bouncer at an exclusive rooftop party with room for exactly 3 guests. A line of 1,000 people stretches around the block, and your job is to let in only the 3 tallest. You don't have time to measure and sort everyone. Instead, you let the first 3 people in. Then, for each new person in line, you compare them to the shortest person currently at the party. Taller? Kick the short one out, let the new one in. Shorter? Move along. After the entire line passes, your party of 3 holds the 3 tallest people guaranteed.
Notice what you're maintaining: a group of k people where you always know who the shortest among them is. That's a min-heap of size k. The smallest element sits at the top, ready for comparison. When someone taller shows up, you pop the top (the shortest at the party) and push the newcomer in. The heap rebalances in O(log k) time. You never sort the full crowd. You never even look at more than one person at a time.
Why a min-heap for the largest elements? It sounds backwards, but it's the key insight. The min-heap's job is to guard the door. Its root is the "weakest link" in your top-k group. Anyone shorter than the root can't possibly belong in the top k, so you reject them instantly. Anyone taller replaces the root. By the end, everything left in the heap survived every challenge.
For the k smallest elements, flip it: use a max-heap of size k. The root is now the largest in your group, and anyone smaller bumps it out. For streaming medians, you split the data into two heaps: a max-heap for the lower half and a min-heap for the upper half. The median lives at the boundary between them.
Kth largest/smallest element, top-K frequent elements, merge K sorted lists, median from data stream, task scheduling, or any problem requiring efficient access to extreme values.
- Need sorted output of all elements (just sort)
- Need frequent insertions AND deletions of arbitrary elements (use balanced BST)
- K equals N (sorting is simpler)
Variations:
- Top-K largest: Min-heap of size k; push then pop when size exceeds k.
- Top-K smallest: Max-heap of size k.
- Two-heap median: Max-heap for lower half, min-heap for upper half.
- K-way merge: Min-heap to merge K sorted arrays/lists efficiently.
- K equals the array length (return all elements)
- K equals 1 (find the single extreme element)
- All elements are equal
Key Points
- •Use a min-heap of size k to find k largest elements
- •Use a max-heap of size k to find k smallest elements
- •Maintain heap invariant: push then pop if size exceeds k
- •For median finding, use two heaps (max-heap for lower half, min-heap for upper half)
- •Python heapq is a min-heap; negate values for max-heap behavior
Code Template
1 import heapq
2
3 def top_k_largest(nums, k):
4 """Find k largest elements using min-heap of size k."""
5 heap = []
6 for num in nums:
7 heapq.heappush(heap, num)
8 if len(heap) > k:
9 heapq.heappop(heap)
10 return sorted(heap, reverse=True)
11
12 def kth_largest(nums, k):
13 """Find kth largest element."""
14 heap = []
15 for num in nums:
16 heapq.heappush(heap, num)
17 if len(heap) > k:
18 heapq.heappop(heap)
19 return heap[0]
20
21 def top_k_frequent(nums, k):
22 """Find k most frequent elements."""
23 from collections import Counter
24 count = Counter(nums)
25 return heapq.nlargest(k, count.keys(), key=count.get)
26
27 class MedianFinder:
28 """Find median from a data stream."""
29 def __init__(self):
30 self.lo = [] # max-heap (negated)
31 self.hi = [] # min-heap
32
33 def add_num(self, num):
34 heapq.heappush(self.lo, -num)
35 heapq.heappush(self.hi, -heapq.heappop(self.lo))
36 if len(self.hi) > len(self.lo):
37 heapq.heappush(self.lo, -heapq.heappop(self.hi))
38
39 def find_median(self):
40 if len(self.lo) > len(self.hi):
41 return -self.lo[0]
42 return (-self.lo[0] + self.hi[0]) / 2Common Mistakes
- Using a max-heap for top-K largest (should use min-heap of size k)
- Forgetting to negate values in Python for max-heap simulation
- Not handling the case when k > array length