Merge K Sorted Lists
The Intuition
When k sorted streams need to be merged into one sorted output, the next element of the output is always the smallest current head across all k streams. A min-heap of size k makes "smallest current head" an O(log k) operation, which leads directly to an O(N log k) total runtime where N is the combined length.
The shape of the loop is fixed: pop the smallest, attach it to the output, then push the successor in the same stream the popped element came from. The heap always holds at most one node per stream, so its size never exceeds k. Streams that run out simply do not push a successor, so they drop out naturally.
The pattern beats the naive "concatenate everything and sort" approach for two reasons. The naive approach is O(N log N) which is worse when k is much smaller than N. It also requires materializing all elements at once, which fails when input is a stream you cannot fully buffer.
The same shape transfers to several adjacent problems. "Find the smallest range that covers at least one element from each of k lists" maintains the heap of current heads plus a running max, then asks where the difference between the heap min and the running max is smallest. "K pairs with smallest sums from two arrays" treats each pair as a node with two indices and pushes its neighbors after popping. The heap continues to hold one frontier per stream.
The technique also frames how databases do external merges and how merge sort scales beyond memory: split the input into k sorted chunks that each fit in memory, then k-way merge them.
Combining k sorted lists, arrays, files, or streams; smallest range covering k lists; k smallest pairs from two arrays; external merge sort; finding the median or kth element across k streams without joining them.
- Only two lists (linear two-pointer merge is simpler)
- The lists are not sorted
- Total size is small enough that flattening and sorting is just as fast
Variations:
- Heap of nodes with tie-breaker: When values tie, breaking on list index keeps the heap ordering total without comparing the node objects.
- Running max alongside heap min: For range-style problems, track the maximum of all values currently in the heap.
- Bidirectional merge with two heaps: Combine min-heap and max-heap to track both extremes of a streamed window.
- some lists are empty (skip them when seeding the heap)
- all lists empty (return null)
- lists of vastly different lengths (algorithm still O(N log k))
- duplicate values across lists (tie-breaker matters)
Key Points
- •Maintain a min-heap of size k holding the current head of every list
- •Pop the smallest, append to the result, push the popped node's successor
- •Heap operations are O(log k); each of the N nodes is pushed and popped once
- •The pattern generalizes any "k sorted streams" problem to O(N log k)
- •When values tie, break ties on list index to keep comparisons total
Code Template
1 import heapq
2
3 class ListNode:
4 def __init__(self, val=0, nxt=None):
5 self.val = val
6 self.next = nxt
7
8 def merge_k_sorted(lists):
9 """Merge k sorted linked lists into one. Returns the new head."""
10 heap = []
11 # Tie-breaker on list index avoids comparing ListNode objects directly.
12 for i, head in enumerate(lists):
13 if head:
14 heapq.heappush(heap, (head.val, i, head))
15 dummy = ListNode()
16 tail = dummy
17 while heap:
18 val, i, node = heapq.heappop(heap)
19 tail.next = node
20 tail = node
21 if node.next:
22 heapq.heappush(heap, (node.next.val, i, node.next))
23 return dummy.next
24
25 def smallest_range_covering_k_lists(lists):
26 """Smallest range [lo, hi] that contains at least one number from each of k sorted lists."""
27 heap = []
28 cur_max = float('-inf')
29 for i, lst in enumerate(lists):
30 if lst:
31 heapq.heappush(heap, (lst[0], i, 0))
32 cur_max = max(cur_max, lst[0])
33 best_lo, best_hi = float('-inf'), float('inf')
34 while heap:
35 val, i, idx = heapq.heappop(heap)
36 if cur_max - val < best_hi - best_lo:
37 best_lo, best_hi = val, cur_max
38 if idx + 1 == len(lists[i]):
39 break
40 next_val = lists[i][idx + 1]
41 cur_max = max(cur_max, next_val)
42 heapq.heappush(heap, (next_val, i, idx + 1))
43 return [best_lo, best_hi]Common Mistakes
- Heap holding values instead of nodes (loses the link to .next)
- Comparing nodes directly when the heap library expects total ordering (use a tuple key)
- Pushing a null successor (always check before pushing)
- Initializing the heap with all nodes flattened (defeats the purpose, becomes O(N log N))