Pipeline Pattern
A pipeline chains stages where each stage transforms data and passes it to the next. Each stage runs concurrently; bounded queues between stages give backpressure for free. Dominant pattern for stream processing, ETL, and any 'producer → transform → transform → consumer' shape.
Diagram
What it is
A pipeline is a chain of stages where each stage takes input, transforms it, and hands the result to the next stage (see diagram above). Each stage runs concurrently. Bounded queues between stages provide automatic backpressure: if Stage 3 slows down, its incoming queue fills, Stage 2 blocks on push, that fills its incoming queue, Stage 1 blocks, and the source throttles. The whole pipeline naturally adapts to the slowest stage.
This is the dominant shape for stream processing: ETL pipelines, HTTP request handlers that hit multiple downstreams, image processing chains, video encoding. Anything where data flows through a series of transformations.
A stage can also fan out internally: if "parse" is slow, run 4 parser workers reading from the same input queue and pushing to the same output queue. Each stage's parallelism is independent.
Why bounded queues
The instinct is to use unbounded queues so producers never block. This is wrong. With unbounded queues, a slow consumer causes the queue to grow without bound, eating memory until OOM. Bounded queues turn that failure mode into backpressure: producers block when full, the system slows down gracefully under load, and the bottleneck stage becomes visible when the queue between it and its predecessor stays full.
The right default is "small bounded queues, sized to absorb burstiness". Powers of two like 100 or 1000 are typical.
Throughput
Pipeline throughput equals the throughput of the slowest stage. Doubling all the other stages does nothing if the slow stage is unchanged. The fix is to identify the bottleneck (full queue upstream of it, empty queue downstream) and scale that stage: run multiple workers reading from the same input.
This is the same logic as Little's Law: in steady state, every stage processes the same items per second, but the slow stage holds the most concurrent items.
Shutdown
The clean shutdown protocol: close the first stage's input. The first stage drains, closes its output. The next stage sees the closed channel, drains, closes its output. Cascade through to the last stage, which signals "done".
In Go, range over a closed channel exits naturally; closing the output is the right idiom. In other languages, send a sentinel value (None, a poison pill) and have each stage propagate it downstream when it sees it.
Errors
The hard part. Each item can fail in the middle of a stage. The options:
Cancel everything. Pass a context (or shared cancellation) through; first error cancels the whole pipeline. Right when a single bad item invalidates the run (transactional ETL).
Skip and log. The stage logs the error and continues with the next item. Right when items are independent and partial completion is fine (web crawler, batch image processor).
Dead-letter queue. The stage sends bad items to a separate error channel for later inspection or retry. Right when bad items are interesting and need auditing.
Pick one and apply it consistently. Mixing strategies across stages confuses everyone.
Implementations
Three threads, two BlockingQueues between. Sentinel pattern (a poison pill) propagates shutdown. Bounded queues give backpressure. Same shape as channels in Go, just more verbose.
1 BlockingQueue<Integer> a = new ArrayBlockingQueue<>(100);
2 BlockingQueue<Integer> b = new ArrayBlockingQueue<>(100);
3 Integer DONE = Integer.MIN_VALUE;
4
5 Thread gen = new Thread(() -> {
6 try {
7 for (int i = 0; i < 100; i++) a.put(i);
8 a.put(DONE);
9 } catch (InterruptedException e) { Thread.currentThread().interrupt(); }
10 });
11
12 Thread sq = new Thread(() -> {
13 try {
14 while (true) {
15 int n = a.take();
16 if (n == DONE) { b.put(DONE); return; }
17 b.put(n * n);
18 }
19 } catch (InterruptedException e) { Thread.currentThread().interrupt(); }
20 });
21
22 Thread out = new Thread(() -> {
23 try {
24 while (true) {
25 int n = b.take();
26 if (n == DONE) return;
27 System.out.println(n);
28 }
29 } catch (InterruptedException e) { Thread.currentThread().interrupt(); }
30 });Key points
- •Each stage is a concurrent worker (or pool) reading from one queue, writing to the next.
- •Bounded queues between stages give automatic backpressure: a slow stage throttles upstream.
- •The slowest stage is the bottleneck. Throughput equals throughput of the slowest stage.
- •Shutdown propagates downstream: close the input, the stage drains and closes the output.
- •Errors are tricky: cancel the whole pipeline, or skip-and-log, or dead-letter the bad item. Pick a policy.
Follow-up questions
▸What if one stage is much slower than the others?
▸How are errors propagated through a pipeline?
▸When to use a pipeline vs fan-out?
Gotchas
- !Forgetting to close the output channel leaves downstream goroutines blocked forever
- !Unbounded queues between stages defeat backpressure; bound everything
- !Errors in the middle of the pipeline can leave stages running with no input/output
- !Slow consumer at the end backs up everything; monitor queue depths