Minimum Spanning Tree
The Intuition
Imagine you're the governor of a region with a dozen villages, and you need to build roads so every village can reach every other village. You've surveyed all possible road segments and know the cost of each. You want to spend as little money as possible while keeping every village connected. You don't need direct roads between every pair. You just need enough roads that you can drive from any village to any other, possibly through intermediate villages.
Kruskal's approach is beautifully greedy. Sort every possible road segment by cost, cheapest first. Pick the cheapest road. Does it connect two villages that aren't already connected? Build it. Does it connect two villages that can already reach each other through roads you've already built? Skip it, because it would be wasted money creating a redundant loop. Move to the next cheapest road and repeat. When you've built exactly (number of villages minus 1) roads, every village is connected and you're done.
The "are these two villages already connected?" question is where Union-Find earns its keep. Each village starts in its own group. When you build a road between village A and village B, you merge their groups. Before building any road, you check whether the two villages share the same group. If they do, they're already connected and the road would create a cycle. Union-Find answers this in nearly O(1) time with path compression and union by rank. Without it, you'd need a BFS or DFS for every edge, which would be far too slow.
Prim's approach works differently but reaches the same answer. Start at any village. Look at all roads leading out of your connected cluster. Pick the cheapest one. That road pulls a new village into your cluster. Now look at all roads leaving the (now larger) cluster, pick the cheapest, and repeat. A min-heap keeps track of the frontier roads so you always grab the cheapest one efficiently. Kruskal's thinks globally (sort all roads), while Prim's thinks locally (always extend from what you have). Both produce the same minimum total cost.
Network design (cables, roads, pipelines), clustering by removing expensive edges, approximation algorithms for NP-hard problems like Traveling Salesman, or any problem asking for minimum cost to connect all nodes.
Graph is directed (MST is for undirected), need shortest path (use Dijkstra), or graph is already a tree (it's already minimal).
Variations:
- Kruskal's algorithm: Sort all edges, greedily add the smallest edge that does not form a cycle. Best for sparse graphs.
- Prim's algorithm: Start from any vertex, repeatedly add the cheapest edge connecting the tree to a new vertex. Best for dense graphs.
- Boruvka's algorithm: Each component picks its cheapest outgoing edge simultaneously. Useful for parallel MST computation.
- Graph is disconnected (no valid MST exists, return -1 or handle accordingly)
- Graph has only one vertex (MST cost is 0 with no edges)
- Multiple edges with the same weight (any valid MST is acceptable)
- Graph is already a tree (all edges are part of the MST)
- Self-loops should be ignored since they cannot be part of an MST
- Parallel edges between the same pair of vertices (pick the one with smaller weight)
- Very large graphs where sorting edges dominates runtime
Key Points
- •Kruskal's algorithm sorts edges and uses Union-Find to avoid cycles
- •Prim's algorithm grows the MST from a starting vertex using a min-heap
- •Both algorithms produce the same MST weight for a given graph
- •MST contains exactly V - 1 edges for a connected graph with V vertices
- •Kruskal's is better for sparse graphs; Prim's is better for dense graphs
Code Template
1 import heapq
2
3 class UnionFind:
4 def __init__(self, n):
5 self.parent = list(range(n))
6 self.rank = [0] * n
7
8 def find(self, x):
9 if self.parent[x] != x:
10 self.parent[x] = self.find(self.parent[x])
11 return self.parent[x]
12
13 def union(self, x, y):
14 px, py = self.find(x), self.find(y)
15 if px == py:
16 return False
17 if self.rank[px] < self.rank[py]:
18 px, py = py, px
19 self.parent[py] = px
20 if self.rank[px] == self.rank[py]:
21 self.rank[px] += 1
22 return True
23
24 def kruskal(n, edges):
25 """Kruskal's MST: sort edges by weight, union greedily.
26 edges: list of (weight, u, v)
27 """
28 edges.sort()
29 uf = UnionFind(n)
30 mst_cost = 0
31 mst_edges = 0
32
33 for weight, u, v in edges:
34 if uf.union(u, v):
35 mst_cost += weight
36 mst_edges += 1
37 if mst_edges == n - 1:
38 break
39
40 return mst_cost if mst_edges == n - 1 else -1
41
42 def prim(n, adj):
43 """Prim's MST: grow tree from vertex 0 using a min-heap.
44 adj: adjacency list of (weight, neighbor)
45 """
46 visited = [False] * n
47 min_heap = [(0, 0)] # (weight, vertex)
48 mst_cost = 0
49 mst_edges = 0
50
51 while min_heap and mst_edges < n:
52 weight, u = heapq.heappop(min_heap)
53 if visited[u]:
54 continue
55 visited[u] = True
56 mst_cost += weight
57 mst_edges += 1
58
59 for w, v in adj[u]:
60 if not visited[v]:
61 heapq.heappush(min_heap, (w, v))
62
63 return mst_cost if mst_edges == n else -1Common Mistakes
- Forgetting to check connectivity before assuming an MST exists
- Not using union by rank or path compression in Union-Find for Kruskal's
- Adding an edge that creates a cycle in Kruskal's approach
- Not handling disconnected components (result has fewer than V - 1 edges)