Parallel Graph Traversal (BFS/DFS)
Parallel BFS: process each level in parallel; serial frontier-to-frontier transitions. Parallel DFS: harder due to depth-recursion; usually use task-pool of subgraphs. Concurrent visited-set required either way; ConcurrentHashMap or Bloom filter.
When to parallelize graph traversal
- Web crawling, each page is a node; edges are links. Per-level parallelism.
- Social-network queries, "all friends-of-friends." Two-level BFS, parallelizable.
- Dependency resolution, package managers, build systems. DAG traversal.
- PageRank, GraphX, Pregel, distributed graph processing frameworks built on parallel BFS-like primitives.
Why BFS, not DFS
BFS has a natural barrier at each level, process all nodes at depth k, then proceed to k+1. The parallelism unit is "all nodes at the current level." Inside a level, each node's neighbor expansion is independent.
DFS has no such barrier, traversal goes as deep as possible, then backtracks. Naive parallelization breaks DFS's ordering invariants. Workarounds: spawn task per subtree, dedupe with shared visited.
The synchronization rule Between levels: ALL workers must finish before next level starts. Otherwise levels mix, a node at distance 3 might be processed before all distance-2 nodes finish, breaking distance correctness.
The visited-set bottleneck
Discovery rate >> uniqueness rate (most edges go to already-visited nodes). The visited set sees lots of contention. Choices:
| Set type | Pro | Con |
|---|---|---|
| Synchronized HashSet | Simple | Single lock = bottleneck |
| ConcurrentHashMap.newKeySet() | Atomic add | Memory-heavy |
| sync.Map (Go) | Lock-free reads | Slow if many writes |
| Bloom filter | Constant memory | False positives |
| Sharded sync.Mutex + map | Reduced contention | More complex |
Most production: ConcurrentHashMap (Java) or sharded sync.Mutex (Go).
When NOT to parallelize
- Small graphs, overhead exceeds work.
- Sparse graphs, frontier is small per level; parallelism is wasted on synchronization.
- Tree shape, recursion is the natural structure; parallel doesn't help.
The interview answer "Parallel BFS: process each level concurrently, synchronize between levels, concurrent visited set for dedup. Parallel DFS: harder; use task pool. Visited set is the hot spot, choose Concurrent map for correctness, Bloom for memory at extreme scale."
Implementations
Each level: process all nodes in parallel, collect next-level nodes, dedupe via concurrent visited set, swap to next level. Use ForkJoinPool or parallelStream.
1 import java.util.*;
2 import java.util.concurrent.*;
3
4 class ParallelBFS {
5 public Map<Integer, Integer> traverse(Graph graph, int source) {
6 Set<Integer> visited = ConcurrentHashMap.newKeySet();
7 Map<Integer, Integer> distance = new ConcurrentHashMap<>();
8 List<Integer> frontier = new ArrayList<>(List.of(source));
9 visited.add(source);
10 distance.put(source, 0);
11 int level = 0;
12
13 while (!frontier.isEmpty()) {
14 final int currentLevel = level;
15 // Process current level in parallel; collect next frontier
16 List<Integer> nextFrontier = frontier.parallelStream()
17 .flatMap(node -> graph.neighbors(node).stream()
18 .filter(neighbor -> visited.add(neighbor))) // atomic add
19 .peek(neighbor -> distance.put(neighbor, currentLevel + 1))
20 .collect(Collectors.toList());
21 frontier = nextFrontier;
22 level++;
23 }
24 return distance;
25 }
26 }Key points
- •BFS parallelizes naturally per-level, all nodes at level k processed concurrently
- •DFS harder, recursive nature serializes; use bounded task pool of subgraphs
- •Visited set must be concurrent, CHM, sync.Map, atomic bitset
- •Frontier transitions are barriers, synchronize all workers between levels
- •Used in PageRank, social-graph queries, dependency resolution at scale
Follow-up questions
▸Why is BFS easier to parallelize than DFS?
▸Why does visited set need to be concurrent?
▸When does parallel BFS NOT win?
▸What about Bloom filter for visited set?
Gotchas
- !If workers don't synchronize between levels, processing crosses levels and BFS becomes wrong
- !Concurrent map's `add` semantics matter, use `newKeySet().add()` (returns true if newly added) for dedup
- !Level barrier requires ALL workers from level k to finish before level k+1 starts, wg.Wait()
- !DFS isn't level-based, naive parallelization breaks ordering
- !Memory-bound when frontier is huge, bound concurrency to avoid OOM