Parallel MapReduce: Word Frequency Counter
Map: each worker processes a chunk of input → partial counts. Reduce: merge partial counts into final result. Map step is embarrassingly parallel; reduce step is the synchronization point. Java's parallelStream, Python's multiprocessing.Pool.map, Go's errgroup, all express this.
The MapReduce template
1. Map: split input into chunks; each worker processes its chunk > partial result
2. Reduce: merge partial results into final answer
Embarrassingly parallel because the map step has no inter-worker communication. The reduce step is the synchronization point.
When the problem reads "count Y across X", think MapReduce Word counts, log entries, distinct visitors, sums by category. The pattern: per-chunk count, then merge. Generalizes from in-process (parallelStream) to distributed (Hadoop, Spark, BigQuery).
Per-worker accumulator vs shared state
Don't share the result map Tempting: have all workers write to one ConcurrentHashMap. Looks simple. Result: lock contention dominates; parallelism wasted.
Right pattern: each worker has its OWN local accumulator. No locking. Final reduce merges. This is THE pattern for "concurrent computation that builds up state."
Same idea as LongAdder (per-thread shards + sum on read).
Reduce strategies, ranked by efficiency
| Strategy | When |
|---|---|
| Sequential merge | Few workers, small partials |
| Tree-reduce | Many workers, expensive merge, log(N) depth |
| Concurrent merge target | Last resort, high contention |
When parallel MapReduce isn't worth it
- Tiny inputs, overhead exceeds the work.
- Reduce dominates, if 90% of time is in the merge, parallelism in map doesn't help.
- Memory-bound, partial maps may be huge. Per-worker memory × N workers can OOM.
The interview signal Junior: "I'd use parallelStream." (Fine, but doesn't show depth.)
Senior: "Map step parallelizes embarrassingly; reduce is the synchronization point. Per-worker accumulators avoid contention. Reduce can tree-reduce for log(N) depth. Generalizes to Hadoop/Spark, same pattern, just at machine scale instead of thread scale."
Implementations
parallelStream().collect(toMap(...)) runs the map step in parallel using ForkJoinPool.commonPool. For most cases, this is the right answer. Don't hand-roll a replacement.
1 import java.util.*;
2 import java.util.stream.Collectors;
3
4 Map<String, Long> wordCount = lines.parallelStream()
5 .flatMap(line -> Arrays.stream(line.split("\\s+")))
6 .filter(w -> !w.isBlank())
7 .map(String::toLowerCase)
8 .collect(Collectors.groupingBy(w -> w, Collectors.counting()));For more control: split input, fork map tasks, merge results. Useful when a custom reducer or non-stream input is needed.
1 import java.util.concurrent.*;
2
3 class WordCountTask extends RecursiveTask<Map<String, Long>> {
4 private final List<String> lines;
5 private final int lo, hi;
6 private static final int THRESHOLD = 1000;
7
8 public WordCountTask(List<String> lines, int lo, int hi) {
9 this.lines = lines; this.lo = lo; this.hi = hi;
10 }
11
12 @Override
13 protected Map<String, Long> compute() {
14 if (hi - lo < THRESHOLD) {
15 return countSequential(lines.subList(lo, hi));
16 }
17 int mid = (lo + hi) / 2;
18 WordCountTask leftTask = new WordCountTask(lines, lo, mid);
19 WordCountTask rightTask = new WordCountTask(lines, mid, hi);
20 leftTask.fork();
21 Map<String, Long> right = rightTask.compute();
22 Map<String, Long> left = leftTask.join();
23 return mergeMaps(left, right);
24 }
25
26 private Map<String, Long> countSequential(List<String> chunk) {
27 Map<String, Long> m = new HashMap<>();
28 for (String line : chunk) {
29 for (String word : line.split("\\s+")) {
30 m.merge(word.toLowerCase(), 1L, Long::sum);
31 }
32 }
33 return m;
34 }
35
36 private Map<String, Long> mergeMaps(Map<String, Long> a, Map<String, Long> b) {
37 b.forEach((k, v) -> a.merge(k, v, Long::sum));
38 return a;
39 }
40 }Key points
- •Map: chunk input, each worker computes partial result independently
- •Reduce: merge partial results, must be associative (and ideally commutative)
- •Embarrassingly parallel: no inter-worker communication during map
- •Reduce strategies: tree-reduce (parallel), sequential merge, ConcurrentHashMap
- •Foundation of distributed computing, Hadoop, Spark, BigQuery all extend this
Follow-up questions
▸Why per-worker map instead of shared ConcurrentHashMap?
▸How does this connect to Hadoop / Spark?
▸What's a tree-reduce?
▸Why must reduce be associative?
Gotchas
- !Mutable accumulators (like StringBuilder) need synchronization or per-worker copies
- !parallelStream uses commonPool, heavy use starves other parallelStream callers
- !Reduce step can be the bottleneck if maps are huge, consider tree-reduce
- !Float addition is not strictly associative, parallel reduce gives slightly different results than sequential