Inter-Process Communication (Pipes & FIFOs)
Mental Model
A conveyor belt between two factory workers. One places items on the belt, the other picks them off. Limited space: when the belt is full, the placer waits; when it is empty, the picker waits. Neither worker sees the other directly. They only interact through the belt. Cut the belt and the placer gets an alarm.
The Problem
A shell pipeline hangs with no output. The producer is writing faster than the consumer reads, the 64 KB buffer fills, and write() blocks silently -- no error, no timeout. If the consumer crashes, the producer gets SIGPIPE and dies too. In another pipeline, two writers are shoving messages larger than 4096 bytes into the same pipe, breaking PIPE_BUF atomicity -- their output arrives interleaved and garbled.
Architecture
Type ls | grep foo | sort into a terminal and something beautiful happens.
Three processes launch simultaneously. The output of ls flows into grep, and the output of grep flows into sort. No temporary files. No polling. No coordination code. The kernel handles everything: buffering, flow control, and cleanup.
This is a pipe. The oldest IPC mechanism in Unix, dating back to 1973. And it's still the backbone of how processes talk to each other on every Linux system running today.
What Actually Happens
pipe2(fds, O_CLOEXEC) creates two file descriptors. fds[0] is the read end. fds[1] is the write end. Inside the kernel, it allocates a pipe_inode_info structure with a circular buffer of 16 page-sized slots (64KB total on systems with 4KB pages).
After fork(), the child inherits both fds. Here's the critical step: each side must immediately close the end it doesn't use. If the child is reading, close fds[1]. If the parent is writing, close fds[0].
Skip this step and the result is a deadlock. The child's read() never returns EOF because there's still an open write end -- the child's own copy of it.
Flow control is built in. When the buffer fills up, write() blocks. When the buffer empties, read() blocks. This natural backpressure is what makes cat hugefile | grep pattern work without consuming infinite memory -- cat can only write as fast as grep reads.
EOF signaling is elegant. When all write ends close, read() returns 0. The reader knows: "No more data coming." When all read ends close, write() triggers SIGPIPE -- the default action terminates the writer. This is how yes | head -5 works: head reads 5 lines, closes its stdin, and yes dies from SIGPIPE.
Under the Hood
PIPE_BUF atomicity: the guarantee that makes pipelines work. POSIX guarantees that writes of PIPE_BUF bytes or fewer (4096 on Linux) are atomic. If three processes write to the same pipe simultaneously, each small write lands as a contiguous chunk. No interleaving. This is what prevents shell pipeline output from becoming a garbled mess.
Writes larger than PIPE_BUF may be split and interleaved with other writers' data. Large atomic messages require a different IPC mechanism.
Named pipes (FIFOs) add a filesystem address. mkfifo("/tmp/myfifo", 0666) creates a special file. Any process can open it for reading or writing. The kernel creates the same pipe buffer when both ends are opened.
But FIFOs have a subtle blocking behavior: open() itself blocks until both a reader and writer have opened the FIFO. A reader calling open() waits for a writer. A writer calling open() waits for a reader. With O_NONBLOCK, the reader succeeds immediately, but the writer gets ENXIO if no reader exists yet.
splice() and zero-copy. The splice() syscall moves data between a pipe and a file descriptor by transferring page references instead of copying data. The pipe_buffer simply points to the page cache page of the source file. No user-space copy at all. This is how modern proxy servers (HAProxy, Nginx) achieve zero-copy forwarding: splice(client_fd, pipe) then splice(pipe, backend_fd).
Capacity tuning. The default 64KB can cause excessive context switches in high-throughput pipelines. fcntl(fd, F_SETPIPE_SZ, new_size) can increase capacity up to /proc/sys/fs/pipe-max-size (default 1MB). Unprivileged users are limited by pipe-user-pages-soft (16MB total).
Common Questions
Why must unused pipe ends be closed after fork()?
If the parent keeps the read end open while only intending to write, and the child keeps the write end open while only intending to read, the child's read() never gets EOF because there's still an open writer (the child itself). The parent never gets SIGPIPE because there's still a reader (the parent itself). Deadlock.
How does the shell implement "cmd1 | cmd2 | cmd3"?
It creates N-1 pipes for N commands. For each pipe: create with pipe(), fork the left command (redirect stdout to write end via dup2), fork the right command (redirect stdin from read end via dup2). Close all pipe fds in the parent. All commands run concurrently -- the kernel pipe buffer provides backpressure.
What is the "self-pipe trick"?
Signal handlers can only call async-signal-safe functions, but event loops need to react to signals alongside I/O events. The trick: create a pipe, add the read end to epoll, and in the signal handler call write(pipe_fd, "x", 1). The event loop wakes up and handles the signal in its normal context. Linux's signalfd() provides a cleaner alternative.
How Technologies Use This
A Node.js server spawning child processes to handle background tasks starts freezing its event loop. The main thread serving 10,000 concurrent HTTP requests stalls whenever it tries to read child process output. The alternative of writing to temporary files and polling with stat() adds disk I/O latency and wastes CPU.
The problem is synchronous I/O on the main thread. Without pipes registered in the event loop, reading child output means either blocking the thread (freezing all HTTP handling) or polling files on disk (slow and wasteful). The key insight is that kernel pipes are file descriptors that work with epoll, making them compatible with the non-blocking event loop.
When child_process.spawn() is called with stdio set to pipe, libuv creates kernel pipes via pipe2() and registers the read file descriptors with epoll in non-blocking mode. As the child writes output, epoll notifies the event loop, which reads chunks and emits data events without ever blocking the main thread. The kernel's built-in 64KB pipe buffer provides natural backpressure: if the parent falls behind, the child's write() blocks automatically.
A git checkout involving git-lfs takes minutes to filter 50,000 large files through an external smudge/clean process. Profiling shows virtually all the time is spent spawning processes, not filtering data. Each of the 50,000 fork/exec cycles costs 1-5ms of overhead, totaling minutes of pure process-spawning waste.
The original design spawns a fresh filter process per file, creating and destroying a pipe pair each time. The key insight is that a single persistent pipe pair can stream all 50,000 files through one long-running filter process, amortizing the fork/exec cost to a single invocation. PIPE_BUF atomicity (4096 bytes on Linux) guarantees that the packet-line framing headers land as a single contiguous chunk, preventing interleaved writes from corrupting the protocol.
The long-running filter protocol keeps one persistent pipe pair open between Git and the filter process. Git sends file contents through one pipe and reads filtered results from the other, using packet-line framing where each message is prefixed with its length. Since the length prefix is just 4 bytes, well within PIPE_BUF, the framing is always atomic.
A Go program using os/exec silently hangs forever waiting for a child process that has already exited. The parent's Read() on the child's stdout pipe never returns EOF. The hang is intermittent, occurring roughly 1 in 10,000 process spawns under high concurrency, making it nearly impossible to reproduce reliably.
Pipe file descriptors leak into unrelated child processes spawned by the Go runtime for DNS resolution or CGo calls. The leaked write end of the pipe keeps the kernel's reference count above zero, so the parent's read() never sees EOF even after the intended child exits. Without O_CLOEXEC, there is a race window between pipe() and a separate fcntl(FD_CLOEXEC) call where another goroutine can fork and inherit the descriptor.
Go prevents this by creating all pipes with pipe2(O_CLOEXEC), which atomically sets the close-on-exec flag during creation. No race window exists because the flag is set as part of the syscall itself. This eliminates the entire class of pipe-leak deadlock bugs. Always use pipe2() with O_CLOEXEC instead of pipe() followed by fcntl() in any multi-threaded or multi-goroutine program.
Same Concept Across Tech
| Technology | How it uses pipes | Key detail |
|---|---|---|
| Shell | Every | operator creates an anonymous pipe between two processes |
| Node.js | child_process.spawn() with stdio: 'pipe' connects stdin/stdout via pipes | Streams API wraps pipe fd in readable/writable |
| Docker | Container logs flow through a pipe from container stdout to the logging driver | docker logs reads from this pipe |
| Go | os/exec.Cmd.StdoutPipe() returns an io.ReadCloser backed by a pipe fd | Must read before calling Wait() or pipe buffer fills |
| Nginx | Upstream proxying uses pipes for buffering (proxy_buffering on) | Pipe buffer overflow causes 502 errors |
| systemd | Journal captures service stdout/stderr via pipes | StandardOutput=journal pipes through sd_journal |
Stack layer mapping (pipeline hanging):
| Layer | What to check | Tool |
|---|---|---|
| Application | Is the writer producing data? Is the reader consuming? | Application logs |
| Process | Is either side blocked in read() or write()? | cat /proc/PID/wchan, strace |
| Pipe buffer | Is the 64 KB buffer full (writer blocked) or empty (reader blocked)? | /proc/PID/fdinfo/FD (shows pipe buffer info) |
| Kernel | SIGPIPE delivery if reader closed? | strace -e signal |
| Filesystem | For named pipes: does the FIFO file exist? Permissions correct? | ls -la /path/to/fifo |
Design Rationale Writing to a temp file and polling would drag in disk latency for data that never needs persistence. Shared memory works but demands explicit synchronization, overkill for a simple producer-consumer flow. Pipes give built-in flow control (write blocks when full, read blocks when empty) and automatic cleanup (kernel frees the buffer when both ends close), so connecting two processes requires zero configuration. PIPE_BUF atomicity is not optional -- without it, concurrent writers would interleave partial messages and every multi-producer shell pipeline would produce garbled output.
If You See This, Think This
| Symptom | Likely cause | First check |
|---|---|---|
| Pipeline hangs, no output | Pipe buffer full, writer blocked, consumer not reading | strace -e read,write both PIDs |
| Process dies with exit code 141 | SIGPIPE: writing to a pipe whose reader has closed | Handle SIGPIPE or check if consumer crashed |
| Data corruption in pipe output | Multiple writers writing > PIPE_BUF bytes, interleaving | Keep writes under 4096 bytes or use locking |
| Named pipe blocks on open() | No process has opened the other end yet | Open for read and write in separate processes |
| Shell pipeline exits with wrong status | Default: shell returns status of last command in pipeline | Use set -o pipefail to catch failures |
| Pipe buffer too small for throughput | Default 64 KB buffer causes frequent context switches | fcntl(F_SETPIPE_SZ) to increase up to pipe-max-size |
When to Use / Avoid
Use pipes when:
- Connecting the output of one process to the input of another (shell pipelines, log processing)
- Building producer-consumer patterns where both sides run concurrently
- Passing data between parent and child processes without files or sockets
Use named pipes (FIFOs) when:
- The two processes are not parent-child (no fork relationship)
- A filesystem path is needed as the rendezvous point
Avoid when:
- Bidirectional communication is needed (pipes are one-way, use sockets instead)
- Multiple writers write messages larger than PIPE_BUF (4096 bytes), breaking atomicity
- Data persistence is needed (pipes are ephemeral, data lost if either end crashes)
Try It Yourself
1 # Create a named pipe and demonstrate IPC
2
3 mkfifo /tmp/testpipe && echo 'hello' > /tmp/testpipe & cat /tmp/testpipe; rm /tmp/testpipe
4
5 # Show pipe buffer size for a process's pipe fds
6
7 for fd in /proc/$$/fd/*; do readlink $fd 2>/dev/null | grep -q pipe && cat $(dirname $fd)/./fdinfo/$(basename $fd) 2>/dev/null; done
8
9 # Trace data flow through a shell pipeline
10
11 strace -f -e trace=pipe2,read,write,dup2,close sh -c 'echo hello | tr a-z A-Z' 2>&1 | grep -E 'pipe|dup2|write'
12
13 # Check system-wide pipe limits
14
15 cat /proc/sys/fs/pipe-max-size && cat /proc/sys/fs/pipe-user-pages-soft
16
17 # Find all pipe fds for a process
18
19 ls -la /proc/$$/fd 2>/dev/null | grep pipe
20
21 # Use splice for zero-copy pipe-to-file transfer
22
23 dd if=/dev/urandom bs=1M count=10 | cat > /dev/null # cat uses splice internallyDebug Checklist
- 1
Check pipe buffer size: cat /proc/sys/fs/pipe-max-size - 2
Find processes connected by a pipe: lsof -p <pid> | grep pipe - 3
Check if a process is blocked on pipe read/write: cat /proc/<pid>/wchan - 4
See pipe usage in strace: strace -e read,write -p <pid> - 5
Check for SIGPIPE: strace -e signal -p <pid> - 6
List named pipes: find /tmp -type p 2>/dev/null
Key Takeaways
- ✓Writes of PIPE_BUF (4096 on Linux) bytes or fewer are guaranteed atomic -- they'll never be interleaved with other writers' data. This is a POSIX guarantee that every shell pipeline depends on. Writes larger than PIPE_BUF can be split and interleaved.
- ✓When the last reader closes a pipe, the writer gets SIGPIPE (default: terminate). If SIGPIPE is blocked, write() returns EPIPE. That's how 'yes | head -5' works -- head closes its stdin after 5 lines, and yes gets killed by SIGPIPE.
- ✓pipe2(fds, O_CLOEXEC | O_NONBLOCK) is the right way to create pipes. Between pipe() and a separate fcntl(), another thread can fork() and leak the fds to a child process. pipe2() is atomic.
- ✓Default capacity is 64KB (16 pages), but you can grow it up to pipe-max-size (default 1MB) via fcntl(F_SETPIPE_SZ). Unprivileged users are capped by pipe-user-pages-soft (16MB total across all pipes).
- ✓splice() and tee() move data between pipes and fds without copying through userspace -- the kernel transfers page references directly. This is how efficient proxy servers achieve zero-copy I/O.
Common Pitfalls
- ✗Mistake: not closing unused pipe ends after fork(). Reality: if the child still has the write end open, the child's own read() will never see EOF because there's still an open writer -- itself. Deadlock. Always close the end you don't use.
- ✗Mistake: assuming read() returns the full amount. Reality: pipe reads can be short, especially with O_NONBLOCK. Always loop until you get the expected bytes or hit EOF/error.
- ✗Mistake: opening a FIFO without understanding the block. Reality: open() on a FIFO blocks until the other end is also opened. O_NONBLOCK on a reader succeeds immediately; O_NONBLOCK on a writer fails with ENXIO if no reader exists.
- ✗Mistake: assuming pipes are bidirectional. Reality: Unix pipes are strictly one-way. For bidirectional communication, use two pipes or a socketpair().
Reference
In One Line
Close unused pipe ends after fork, keep writes under PIPE_BUF for atomicity, and handle SIGPIPE -- that covers 90% of pipe bugs.