Topological Sort
The Intuition
Imagine you're getting dressed in the morning. You can't put shoes on before socks. You can't button your shirt before you've actually put it on. Each clothing item has dependencies: things that must come first. Your goal is to find an order that respects every single dependency.
Start by looking for items with zero dependencies. Underwear, socks, undershirt. Nothing needs to come before them, so you can put them on right now. Once those are on, check what they've unlocked. Socks are done, so shoes are now available. Undershirt is done, so your dress shirt is now available. Pick any newly unblocked item, put it on, and see what it unlocks next. Repeat until you're fully dressed.
That's Kahn's algorithm. Count how many dependencies each item has (its "in-degree"). Start with everything that has zero dependencies. Process one, reduce the in-degree of everything that depended on it. Whenever something's count hits zero, it's ready. If you process everything and some items still have unmet dependencies, you've got a circular dependency, like needing your shoes on before you can put on your socks. That's a cycle, and no valid order exists.
The DFS approach flips the perspective. Instead of starting from items with no dependencies, you dive deep into dependency chains. You keep following "what does this depend on?" all the way to the bottom, then record items in reverse as you climb back up. The last thing you finish exploring goes first in the final order.
Course scheduling, build system dependencies, task ordering, compilation order, or any problem with prerequisites/dependencies that form a DAG.
- Graph has cycles (no valid ordering exists)
- Undirected graph (topological sort is for DAGs only)
- Need shortest path (use BFS/Dijkstra)
Variations:
- Kahn's algorithm (BFS): Uses in-degree counting; naturally detects cycles.
- DFS-based: Records nodes in reverse post-order; simpler to implement recursively.
- Parallel scheduling: Use level-by-level BFS to find which tasks can run concurrently.
- Cycle exists so no valid topological order is possible
- Single node with no dependencies
- Disconnected DAG with multiple independent subgraphs
Key Points
- •Only works on Directed Acyclic Graphs (DAGs)
- •Kahn's algorithm uses in-degree counting + BFS
- •DFS approach records nodes in reverse post-order
- •If result size < number of nodes, a cycle exists
- •Multiple valid orderings may exist
Code Template
1 from collections import deque
2
3 def topological_sort_kahn(graph, n):
4 """Kahn's algorithm (BFS-based topological sort)."""
5 in_degree = [0] * n
6 for u in range(n):
7 for v in graph[u]:
8 in_degree[v] += 1
9
10 queue = deque(i for i in range(n) if in_degree[i] == 0)
11 order = []
12
13 while queue:
14 node = queue.popleft()
15 order.append(node)
16 for neighbor in graph[node]:
17 in_degree[neighbor] -= 1
18 if in_degree[neighbor] == 0:
19 queue.append(neighbor)
20
21 if len(order) != n:
22 return [] # cycle detected
23 return order
24
25 def topological_sort_dfs(graph, n):
26 """DFS-based topological sort (reverse post-order)."""
27 visited = [False] * n
28 stack = []
29
30 def dfs(u):
31 visited[u] = True
32 for v in graph[u]:
33 if not visited[v]:
34 dfs(v)
35 stack.append(u)
36
37 for i in range(n):
38 if not visited[i]:
39 dfs(i)
40
41 return stack[::-1]Common Mistakes
- Applying to graphs with cycles (no valid ordering exists)
- Forgetting to detect cycles via result size check
- Not initializing in-degree array for all nodes