Fan-Out / Fan-In
Fan-out: distribute work to multiple workers concurrently. Fan-in: merge their results back into a single stream. The two together parallelise the middle of a pipeline. Standard shape: one input, N workers each doing the same transformation, one output combining results.
Diagram
What it is
Fan-out: hand work out to N concurrent workers. Fan-in: merge their results back into a single stream (see diagram above).
The key property: workers are interchangeable. Any worker can take the next item; nobody owns specific data. Fast workers grab more items than slow ones (automatic load balancing).
This is the parallelisation pattern for the middle of a pipeline. It comes up everywhere: web crawlers (one URL queue, N fetchers), batch processing (one input list, N processors), HTTP fan-out to backends (one request, N parallel calls, one merged response).
Why fan-out works
Three properties combine. First, the worker is the thing being parallelised, not the data. Adding more workers raises throughput up to whatever bottleneck is downstream. Second, the work is independent: workers don't talk to each other. Third, work-stealing happens automatically when workers share an input queue: fast workers grab more items.
The result: parallelism scales by changing one number (the worker count) without changing the structure of the code.
Why bound it
The naive impulse: "more workers is always better". Wrong. Each worker uses memory, threads/goroutines, file descriptors. Each worker hits the downstream. Unbounded workers can:
- Exhaust file descriptors and trigger "too many open files".
- Overwhelm the downstream and turn its slow into the caller's slow.
- Burn CPU on context switching with no real throughput gain.
Pick a worker count and stick to it. For CPU-bound work, that is core count. For I/O-bound work, somewhere between 10 and 1000 depending on what the downstream can take.
Order preservation
Fan-out loses order. Workers run concurrently; whichever finishes first writes its result first. If the consumer needs results in input order, add explicit sequence numbers and re-sort downstream.
There are alternatives:
- Per-key partitioning. Hash each input by some key; route to one of N workers based on the hash. Within a key, order is preserved. Used for partitioned event streams (Kafka).
- Min-heap consumer. Consumer holds a min-heap, only emits the next sequence number when it arrives. Streaming preservation, but the heap can grow if one worker is slow.
- Drop the requirement. Often the downstream does not actually care about order. Confirm before adding complexity.
Errors
Two strategies.
Cancel on first error. Shared context. First worker that hits an error cancels the rest, fan-in surfaces the error to the caller. Right when any one failure invalidates the result (transactional fan-out: load all parts of an order page; if any fails, fail the page).
Collect errors and continue. Each worker reports its outcome (success or error). Fan-in returns a list of results, some good, some bad. Caller decides. Right for batch processing where partial success is acceptable.
Pick at the call site. Don't mix; that confuses everyone reading the code.
A note on libraries
The standard libraries make this easy:
- Go:
errgroupwithSetLimit(n). Cancellation built in. - Java:
StructuredTaskScope(Java 21+) orExecutorService.invokeAll. - Python (asyncio):
TaskGroupwith semaphore for limit. - Python (threads/processes):
concurrent.futures.Executor.map.
Hand-rolling fan-out with raw channels/threads is fine for learning. For production code, use the helpers; they handle the cancellation and the wait-for-all that everyone gets wrong.
Implementations
Submit all tasks to the pool; collect futures; await results. Pool size caps concurrency. invokeAll is a one-shot fan-out helper that returns a list of futures all done.
1 import java.util.List;
2 import java.util.concurrent.*;
3
4 try (ExecutorService pool = Executors.newFixedThreadPool(8)) {
5 List<Callable<Result>> tasks = jobs.stream()
6 .map(j -> (Callable<Result>) () -> doWork(j))
7 .toList();
8
9 List<Future<Result>> futures = pool.invokeAll(tasks);
10
11 List<Result> results = new ArrayList<>();
12 for (Future<Result> f : futures) {
13 results.add(f.get()); // blocks if not done
14 }
15 return results;
16 }Key points
- •Fan-out: one input channel/queue feeds N workers. Each worker reads independently.
- •Fan-in: N output channels merge into one. With channels, this is N goroutines copying to a shared output channel.
- •Order is not preserved by default; preserving it requires a sequence number and a re-sort downstream.
- •Worker count = degree of parallelism. Cap it (errgroup.SetLimit, semaphore, ThreadPoolExecutor maxWorkers).
- •Combine with retry, circuit breaker, timeout for production-grade fan-out.
Follow-up questions
▸How many workers should run?
▸How is order preserved?
▸What if some workers are much slower than others?
▸How are errors propagated?
Gotchas
- !Forgetting to close the output channel leaves the consumer blocked forever
- !Unbounded worker count exhausts FDs and overwhelms downstream
- !Order is lost unless explicitly preserved with sequence numbers
- !One slow worker doesn't slow others if input is shared, but slows the overall completion
- !Errors in one worker can leak resources if not propagated through cancellation