Signals & Signal Handling
Mental Model
Someone taps a shoulder while a person is mid-sentence at a desk. The sentence stops, an urgent note gets read, and then the original sentence picks up exactly where it left off. The danger: if the note says "go pour coffee" but the interrupted task was already holding the coffee pot, both hands reach for the same pot and everything freezes. Safe rule: glance at the note, scribble a reminder on a sticky pad, and deal with it properly once the current sentence is done.
The Problem
docker stop sends SIGTERM to PID 1, but bash is sitting there as PID 1 and silently ignores it. The app never hears about the shutdown. Ten seconds later, SIGKILL destroys everything -- in-flight data gone, transactions left half-written. Elsewhere, five child processes exit before the parent handles SIGCHLD, but standard signals collapse into a single pending bit. Four zombies pile up silently, leaking PIDs toward the 32,768 limit. And then the subtle one: a SIGTERM handler calls printf(), which grabs an internal lock, but the signal interrupted printf() mid-operation in the main thread. Deadlock. It reproduces once per 100,000 shutdowns and takes weeks to track down.
Architecture
A server is running. It needs to reload its config. There is no socket to connect to. No file it polls. The only option is to tap it on the shoulder and say "hey, re-read the configuration."
That tap is a signal. It is the oldest form of inter-process communication in Unix, and it is still everywhere. Nginx uses SIGHUP for config reload. Redis uses SIGTERM for graceful shutdown. Go uses SIGURG internally to preempt goroutines. The JVM turns SIGSEGV into NullPointerException.
But here is the catch. Signals are asynchronous. They can interrupt code at any point -- mid-syscall, mid-malloc, mid-printf. If a signal handler calls the wrong function at the wrong time, the result is corruption or deadlock. And the bug only reproduces when the signal arrives at exactly the wrong instruction. These are some of the hardest bugs in systems programming.
What Actually Happens
Signal delivery is a two-phase process.
Phase 1: Sending. When a signal is sent -- via kill(), tgkill(), or a kernel event like SIGSEGV -- the kernel adds the signal to the target's pending set and sets the TIF_SIGPENDING flag on the target thread. The signal is NOT delivered yet. It sits in the pending queue.
Phase 2: Delivery. The signal is delivered when the target thread transitions from kernel mode to user mode (returning from a syscall, interrupt, or page fault). At that transition point, do_signal() checks for pending unblocked signals and delivers them.
Delivery means the kernel manipulates the thread's saved register state:
- Push a signal frame onto the thread's stack (or alternate stack via
sigaltstack()). - The frame contains saved registers, signal number, optional
siginfo_t, and a hidden return address pointing to asigreturn()trampoline. - Set the instruction pointer to the handler function.
- The handler runs. When it returns,
sigreturn()restores the original registers. - Execution resumes at the exact instruction that was interrupted.
The signal mask (sigprocmask()) controls which signals are blocked. Blocked signals stay pending until unblocked. During handler execution, the kernel automatically blocks the signal being handled (preventing recursive delivery) plus any extra signals specified in sa_mask.
Under the Hood
Standard vs real-time signals. Signals 1-31 are standard signals (SIGTERM, SIGCHLD, etc.) and are NOT queued. Two SIGCHLDs collapse into one pending bit. Signals 32-64 are real-time signals (SIGRTMIN to SIGRTMAX) and ARE queued, with delivery in priority order (lowest number first). Real-time signals also carry data via siginfo_t.si_value.
This has a practical consequence. If five child processes exit before the parent handles SIGCHLD, only one SIGCHLD may be delivered. The handler must call waitpid(-1, &status, WNOHANG) in a loop to reap ALL children. Handling exactly one child per SIGCHLD is a classic bug.
Async-signal-safety. When a signal interrupts a function, the handler runs in the same thread context. If the interrupted function holds an internal lock (like malloc's arena lock), calling that function from the handler deadlocks. The POSIX-guaranteed safe functions include: _exit, write, read, open, close, signal, kill, fork, exec*, waitpid, sem_post, and a few others. Notably absent: printf, malloc, free, syslog, pthread_mutex_lock.
The practical rule: do almost nothing in the handler. Set a volatile sig_atomic_t flag and return. Do the real work in the main loop.
signalfd() and the self-pipe trick. Event-driven servers cannot use traditional signal handlers because they disrupt the event loop. The classic solution is the "self-pipe trick": the handler writes a byte to a pipe that is in the epoll set. Linux's signalfd() is cleaner -- block the signal with sigprocmask(), then read signalfd_siginfo structs from a file descriptor in the event loop. No handler, no async-safety concerns. This is what systemd and many modern daemons use.
Process-directed vs thread-directed. kill() sends a signal to the process (thread group). The kernel picks an arbitrary thread that does not have the signal blocked. tgkill() / pthread_kill() targets a specific thread. Synchronous signals (SIGSEGV, SIGFPE, SIGBUS) are always delivered to the faulting thread -- anything else would not make sense.
Common Questions
How does SIGCHLD handling interact with waitpid() in a concurrent server?
Since standard signals are not queued, a single SIGCHLD may represent multiple child exits. The handler (or main loop) must call waitpid(-1, &status, WNOHANG) in a loop until it returns 0 or -1. Calling waitpid() exactly once per SIGCHLD is the classic mistake -- children get missed and zombies accumulate.
What happens when a signal hits a process blocked in a slow syscall?
If the signal has a handler: with SA_RESTART, the kernel automatically restarts the syscall after the handler returns (for most blocking calls). Without SA_RESTART, read() returns -1 with errno=EINTR. But some syscalls are NEVER restarted regardless of SA_RESTART -- connect(), poll(), nanosleep(), sem_wait() always return EINTR. If the signal's disposition is SIG_IGN or default-ignore, the syscall is not interrupted at all.
Why does SIGSEGV sometimes cause a stack overflow instead of a clean crash?
If a SIGSEGV handler itself accesses invalid memory, another SIGSEGV fires. If the handler runs on the main stack (no sigaltstack()) and SIGSEGV is not blocked during its own delivery, recursive faults exhaust the stack. The kernel detects this spiral and kills the process. Production crash handlers (like Google Breakpad) use sigaltstack() with a dedicated 64KB alternate stack to avoid this.
Can a process in D (uninterruptible sleep) state be killed?
No. Not even SIGKILL works. The D state means the process is waiting for I/O completion (typically disk or NFS) and the kernel cannot safely interrupt it without corrupting data structures. That is why hung NFS mounts and broken FUSE filesystems create unkillable processes. Kernel 4.18 introduced TASK_KILLABLE (a D state that responds to fatal signals) for many code paths, but not all.
How Technologies Use This
Running docker stop on a container should trigger a clean shutdown. Ten seconds later, Docker force-kills it with SIGKILL. Data in flight is lost, connections are severed, and database transactions are left inconsistent. The app never received the SIGTERM at all.
If the Dockerfile uses CMD ["bash", "-c", "python app.py"], bash runs as PID 1 inside the container. Bash does not forward signals to children when running as a non-interactive shell -- it ignores SIGTERM entirely. Docker sends SIGTERM to PID 1 (bash), bash ignores it, and the app never knows about the shutdown. After the 10-second grace period, SIGKILL kills everything uncleanly.
Fix: use exec-form CMD ["python", "app.py"] so the app runs as PID 1 and receives SIGTERM directly. Or use Docker's --init flag, which injects tini as PID 1 -- tini forwards signals to all children, waits for them to exit, and then exits itself. Either approach enables graceful shutdown.
A production Nginx instance handles 20,000 active connections. The operator needs to rotate logs, reload config, and eventually shut down -- all without dropping a single request. There is no management socket, no REST API, no control plane. Just a running daemon process.
Signals provide the entire control interface. SIGHUP tells the master to re-read nginx.conf, fork new workers, and drain old ones. SIGUSR1 reopens all log files so logrotate works -- without it, Nginx keeps writing to the deleted old log file since it holds an open fd. SIGQUIT triggers graceful shutdown where workers finish in-flight requests (30+ seconds for streaming responses) before exiting.
Three signals control the entire lifecycle: SIGHUP for reload, SIGUSR1 for log rotation, SIGQUIT for graceful stop. Each triggers a completely different code path in the master process, and the result is zero-downtime config changes and log rotation with no management overhead.
A production PostgreSQL server needs to shut down, but a long-running analytics query has been active for 3 hours. A gentle stop waits for it to finish -- that could be hours more. A hard stop kills everything and requires WAL recovery on restart. Something in between is needed.
PostgreSQL maps three signals to three shutdown modes. SIGTERM is smart shutdown: stop accepting new connections but wait for existing clients to disconnect on their own. SIGINT is fast shutdown: forcibly disconnect all clients, roll back in-progress transactions, and exit cleanly with a checkpoint -- no recovery needed on restart. SIGQUIT is immediate: kill all backends without checkpoint, requiring WAL replay (30 seconds to several minutes) on next startup.
Choose SIGTERM for planned maintenance with flexible timing, SIGINT for urgent restarts requiring a clean state, and SIGQUIT only for emergencies. The postmaster also relies on SIGCHLD with a waitpid(-1, WNOHANG) loop to reap crashed backends, since standard signals do not queue.
Redis is single-threaded. If the SIGTERM handler tries to save the dataset before exiting, it blocks the event loop for 5-30 seconds. Every client stalls. If the handler calls malloc or writes to the log, it risks deadlock or corruption when the signal interrupts those same functions mid-operation.
Redis uses the flag-and-check pattern. The SIGTERM handler sets a single variable (server.shutdown_asap = 1) and returns immediately. No file I/O, no malloc, no logging -- fully async-signal-safe. The main event loop checks this flag on every iteration and performs the actual RDB/AOF save safely outside signal context.
For crashes, Redis installs a SIGSEGV/SIGBUS handler that uses only write() to stderr (async-signal-safe) to dump a stack trace, memory stats, client info, and the last few commands processed. This crash report is often enough to diagnose the bug without a multi-gigabyte core dump, saving hours of debugging time.
Before Go 1.14, a goroutine in a tight CPU loop with no function calls could never be preempted. GC pauses stretched from milliseconds to seconds as the runtime waited for the rogue goroutine to yield. The scheduler was powerless to interrupt it.
Go now uses SIGURG for asynchronous preemption. When the garbage collector or scheduler needs to stop a goroutine, it sends SIGURG to the OS thread running it. The signal handler saves the goroutine's register state, inserts an async preemption point, and allows the scheduler to switch immediately. SIGSEGV and SIGBUS are converted into Go panics with full stack traces, so nil pointer dereferences produce readable errors instead of cryptic core dumps.
For application-level signals, signal.Notify() delivers them to a channel, converting async interrupts into synchronous channel reads that fit Go's concurrency model. This lets a server drain 10,000 in-flight requests on SIGTERM using standard select/channel patterns instead of unsafe signal handlers.
Java code dereferences null pointers constantly in the unhappy path. Adding an explicit if-null check before every pointer access would cost 5-15% of CPU on pointer-heavy code. Yet NullPointerException appears with a clean stack trace and no performance penalty on the happy path.
The JVM registers a SIGSEGV handler at startup. When a null dereference triggers a hardware fault at address 0x0, the handler inspects the faulting address, recognizes the pattern as a managed-object null access, constructs a NullPointerException, and unwinds through the exception framework. SIGFPE becomes ArithmeticException for division by zero. The hardware does the null check for free -- no branch needed.
SIGQUIT (kill -3 or Ctrl+backslash) dumps all thread stacks and lock states to stderr -- invaluable for diagnosing production deadlocks without attaching a debugger. SIGTERM and SIGINT trigger Java's shutdown hooks, giving applications a window to close database connections, flush buffers, and deregister from service discovery before exit.
Kubernetes sends SIGTERM to a Node.js server handling 5,000 in-flight HTTP requests. Without a signal handler, Node exits immediately. All 5,000 clients receive connection-reset errors, triggering retries that cascade into upstream services and amplify the outage.
Signals are asynchronous but Node's event loop is single-threaded. libuv bridges this gap with the self-pipe trick: the signal handler writes a byte to an internal pipe polled by the event loop, converting the async interrupt into a safe callback on the next iteration. No async-signal-safety concerns in user code -- the handler runs as a normal event loop callback.
Use process.on(SIGTERM, handler) to stop accepting new connections, wait for in-flight requests to complete (5-30 seconds with a timeout), close database pools, then exit cleanly. Kubernetes gives pods 30 seconds (terminationGracePeriodSeconds) between SIGTERM and SIGKILL specifically to allow this drain.
Stopping a service that has forked children and spawned background helpers. The main process exits cleanly, but orphaned children keep running -- holding ports, consuming memory, and writing to log files. Sending SIGTERM to just the main PID missed everything else.
systemd sends SIGTERM to every process in the service's cgroup (KillMode=control-group), not just the main PID. This reaches workers, helpers, and grandchildren that setsid() and double-fork cannot hide. After SIGTERM, systemd waits TimeoutStopSec (default 90 seconds) for graceful exit. If any process survives, SIGKILL finishes it off.
Internally, systemd avoids traditional signal handlers entirely. It uses signalfd() to receive signals as readable events on a file descriptor, integrated into its epoll-based event loop. This sidesteps every async-signal-safety concern and lets systemd process SIGCHLD, SIGTERM, and SIGHUP in the same unified dispatch that handles socket activation, timer events, and D-Bus messages.
Same Concept Across Tech
| Concept | Docker | JVM | Node.js | Go | K8s |
|---|---|---|---|---|---|
| Graceful shutdown | docker stop sends SIGTERM to PID 1; use exec-form CMD or --init (tini) | Runtime.addShutdownHook runs on SIGTERM/SIGINT | process.on('SIGTERM') to stop accepting, drain, exit | signal.Notify(ch, syscall.SIGTERM) to channel-based drain | terminationGracePeriodSeconds (default 30s) between SIGTERM and SIGKILL |
| Config reload | docker kill -s HUP $container sends SIGHUP to PID 1 | No standard signal for reload; use JMX or REST endpoint | process.on('SIGHUP') for custom reload logic | signal.Notify for SIGHUP; viper.WatchConfig() for file-based | kubectl rollout restart for pod-level; SIGHUP for in-pod daemons |
| Crash handling | Container restart policy handles crashes; tini reaps zombies | SIGSEGV becomes NullPointerException; SIGQUIT dumps all thread stacks | uncaughtException and unhandledRejection handlers | SIGSEGV/SIGBUS become Go panics with stack traces | restartPolicy + liveness probes for crash recovery |
| Child reaping | PID 1 must reap zombies; tini or --init handles this | N/A -- JVM threads are not child processes | child_process.on('exit') must be handled to avoid zombies | cmd.Wait() must be called for every exec.Command | Container init (PID 1) must handle SIGCHLD |
| Internal signal use | N/A | SIGSEGV for NullPointerException; SIGFPE for ArithmeticException | libuv self-pipe trick converts signals to event loop callbacks | SIGURG for goroutine preemption (Go 1.14+) | N/A |
| Stack Layer | Mechanism |
|---|---|
| Application | sigaction() installs handlers; signalfd() for event-loop integration |
| Language runtime | Go: SIGURG for preemption; JVM: SIGSEGV for NPE; Node: libuv self-pipe trick |
| Kernel signal subsystem | Pending queue (sigpending), delivery on kernel-to-user transition, TIF_SIGPENDING flag |
| Process lifecycle | SIGCHLD on child exit; SIGHUP on terminal hangup; SIGKILL/SIGSTOP uncatchable |
| Hardware | SIGSEGV from MMU page fault; SIGFPE from FPU exception; SIGBUS from alignment fault |
Design rationale: Signals solve a narrow problem -- notifying a process asynchronously without any pre-established communication channel -- and they solve it with almost no overhead. But running code in an interrupted context is inherently dangerous, since the handler shares locks and state with whatever it just interrupted. signalfd() and the self-pipe trick both exist to convert that dangerous async model into the synchronous event-loop model where everything is safe to call.
If You See This, Think This
| Symptom | Likely Cause | First Check |
|---|---|---|
| docker stop takes exactly 10 seconds then force-kills | PID 1 (shell) not forwarding SIGTERM to app | Check Dockerfile CMD form; use exec-form or --init |
| Zombie processes accumulating (Z state in ps) | SIGCHLD handler not looping waitpid() with WNOHANG | grep SigCgt /proc/$PARENT_PID/status to verify SIGCHLD is caught |
| Intermittent deadlock on graceful shutdown | Signal handler calls non-async-signal-safe function (printf, malloc) | strace the hang; check handler code for unsafe calls |
| Process ignores SIGKILL (stuck in D state) | Uninterruptible sleep waiting for I/O (NFS, FUSE, disk) | cat /proc/$PID/wchan to see what kernel function it is blocked in |
| Slow syscalls return EINTR unexpectedly | SA_RESTART not set on sigaction, or non-restartable syscall interrupted | Check sigaction flags; wrap non-restartable calls in EINTR retry loop |
| Only one child reaped despite multiple exits | Standard SIGCHLD not queued; handler calls waitpid once | Change handler to waitpid(-1, &status, WNOHANG) in a while loop |
When to Use / Avoid
- Use when implementing graceful shutdown -- catch SIGTERM to drain connections and flush buffers before exit
- Use when building daemon lifecycle control -- SIGHUP for config reload, SIGUSR1 for log rotation
- Use when reaping child processes -- SIGCHLD handler with waitpid() loop prevents zombie accumulation
- Use signalfd() in event-driven servers to avoid async-signal-safety concerns entirely
- Avoid calling non-async-signal-safe functions (printf, malloc, syslog) inside signal handlers
- Avoid relying on signal queuing for standard signals (1-31) -- use real-time signals if delivery guarantee is needed
Try It Yourself
1 # List all signals with numbers
2
3 kill -l
4
5 # Send SIGUSR1 to a process
6
7 kill -USR1 $(pidof nginx)
8
9 # View signal masks for a process (pending, blocked, ignored, caught)
10
11 grep Sig /proc/$$/status
12
13 # Decode signal mask hex to binary (e.g., SigCgt)
14
15 python3 -c "mask=0x$(grep SigCgt /proc/$$/status | awk '{print $2}'); [print(f' Signal {i}: caught') for i in range(1,65) if mask & (1<<(i-1))]"
16
17 # Trace signal delivery to a running process
18
19 strace -p $(pidof sleep) -e trace=signal 2>&1
20
21 # Send signal to all processes in a process group
22
23 kill -TERM -$(ps -o pgid= -p $PID | tr -d ' ')Debug Checklist
- 1
grep Sig /proc/$PID/status -- view pending, blocked, ignored, and caught signal bitmasks - 2
strace -e trace=signal -p $PID -- trace signal delivery and mask changes in real time - 3
kill -l -- list all signal names and numbers - 4
kill -0 $PID -- test if a process exists without sending a signal - 5
cat /proc/$PID/status | grep -E 'SigPnd|SigBlk' -- check for stuck pending or over-blocked signals - 6
strace -e trace=rt_sigaction -p $PID 2>&1 | head -20 -- see which signals a process has handlers for
Key Takeaways
- ✓Signal handlers literally hijack your thread's control flow. The kernel saves registers to a ucontext_t on the stack, redirects execution to the handler, and on return, a hidden sigreturn() trampoline restores the original context. Your code resumes as if nothing happened.
- ✓Only about 25 functions are async-signal-safe. printf(), malloc(), and mutex operations are NOT among them. In a handler, you should only set a volatile sig_atomic_t flag, call write() on a pipe, or use sem_post(). Everything else risks deadlock or corruption.
- ✓SA_RESTART makes some syscalls auto-restart after signal delivery. read(), write(), wait() are restartable. But connect(), poll(), sem_wait(), and nanosleep() are never restarted -- they always return EINTR. Memorize which ones restart and which do not.
- ✓signalfd() turns signals into file descriptor events. Block the signal with sigprocmask(), then read signalfd_siginfo structs from the fd in your event loop. No async handler needed. This is the modern Linux way to handle signals in event-driven servers.
- ✓SIGKILL and SIGSTOP cannot be caught, blocked, or ignored. Period. But here is the catch: a process in uninterruptible sleep (D state) will not respond to even SIGKILL until it leaves that state. That is why hung NFS mounts create unkillable processes.
Common Pitfalls
- ✗Using signal() instead of sigaction(). Mistake: thinking they are equivalent. Reality: signal() has undefined behavior regarding handler reset (SA_RESETHAND semantics vary across systems) and does not let you control SA_RESTART. Always use sigaction().
- ✗Calling printf, malloc, or syslog in a signal handler. This corrupts internal data structures when the handler interrupts those same functions mid-operation. The fix: set a flag in the handler, do the real work in the main loop.
- ✗Not blocking signals during critical sections. If a handler fires between a check and an update of shared data, the handler sees inconsistent state. Use sigprocmask() to block signals around critical sections.
- ✗Assuming signals are queued. They are not (for standard signals 1-31). If 5 children exit before the parent handles SIGCHLD, only one SIGCHLD may be delivered. The handler must call waitpid() in a WNOHANG loop to reap ALL exited children, not just one.
Reference
In One Line
Signal handlers should do almost nothing -- set a flag and get out, or skip handlers entirely with signalfd() -- and never forget that SIGCHLD does not queue, so waitpid() must loop.