Process Groups, Sessions & Job Control
Mental Model
A school floor with classrooms and a PA system. Each classroom is a process group -- students working on the same project. The PA is the controlling terminal. When the principal makes an announcement (Ctrl+C), only the classroom currently presenting (foreground group) hears it. End of the school day (terminal hangup) -- the floor manager (session leader) tells every classroom to pack up. But a classroom that relocated to a separate building across the street (setsid) never hears the PA at all.
The Problem
systemctl stop reports a service as stopped, but grandchildren that called setsid() escaped the process group kill. Over days, 50+ rogue workers accumulate, leaking 12 GB of memory and holding 3,000 file descriptors. An SSH disconnect sends SIGHUP to the session leader and kills a critical background pipeline mid-run -- 6 hours of processed output, corrupted. Three processes in a pipeline each landed in a separate group, so Ctrl+C only reaches one; the other two spin at 100% CPU eating pipe buffer memory.
Architecture
Type grep foo file | sort | head and hit Ctrl+C. All three processes die at the same time.
Stop and think about that. No signal was sent to each process individually. Two keys were pressed, and the kernel figured out that those three processes belong together and killed them as a unit. How?
Then the SSH session closes. The background make job dies. But the process started under tmux keeps running. Same logout, different outcomes. What determines which processes survive?
The answer to both questions is the same: process groups and sessions. They are the invisible layer between the terminal and the processes, and once they become visible, a dozen things that seemed like magic suddenly make sense.
What Actually Happens
The Unix process hierarchy has three levels: processes, process groups, and sessions.
Process groups. When a shell creates a pipeline, it puts all processes in that pipeline into the same process group. The shell calls setpgid() to assign the first process's PID as the PGID for all of them. Hitting Ctrl+C causes the terminal driver to send SIGINT to every process with that PGID. One keystroke, all processes in the pipeline.
Sessions. A session is a collection of process groups. At login, the login process calls setsid() to create a new session. The login shell becomes the session leader (its PID equals its SID). Each command or pipeline launched gets its own process group within this session.
The controlling terminal. Each session can have one controlling terminal -- the tty associated with the login. The terminal driver uses this to route keyboard signals. The shell uses tcsetpgrp() to designate one process group as the foreground group. Only the foreground group receives Ctrl+C, Ctrl+Z, and Ctrl+\.
The hangup cascade. When the controlling terminal hangs up (SSH disconnects, terminal window closes), the kernel sends SIGHUP to the session leader. The shell catches it and forwards SIGHUP to every process group it manages. Then the shell exits. This cascade is why background jobs die on logout -- they are still in the same session.
Processes started under tmux or screen live in different sessions. The hangup from the terminal's session cannot reach them.
Under the Hood
Foreground vs background groups. Only the foreground process group can read from and write to the terminal. If a background process tries to read() from the terminal, the kernel sends it SIGTTIN (default action: stop). This prevents background jobs from stealing terminal input. If the terminal's tostop flag is set, background writers get SIGTTOU too. The fg command calls tcsetpgrp() to promote a background group to foreground.
The setpgid() race. After fork(), there is a race. The shell wants to set the child's PGID, and the child wants to set its own PGID. If the shell sends a signal to the process group before both sides have completed their setpgid() calls, things go wrong. Bash and zsh solve this by calling setpgid() from both parent and child. If one call fails with EACCES (child already exec'd), that is fine -- the other side already handled it.
Orphaned process groups. When a process group becomes orphaned (no member has a parent in a different group within the same session), the kernel checks if any member is stopped. If so, it sends SIGHUP followed by SIGCONT to the entire group. The reasoning: stopped processes in an orphaned group can never be resumed by job control (the shell is gone), so SIGHUP gives them a chance to clean up, and SIGCONT ensures they actually wake up to handle it.
nohup vs setsid vs disown. These all protect processes from terminal hangup, but in completely different ways. nohup ignores SIGHUP and redirects output -- the process stays in the same session. setsid creates a new session -- the process is unreachable by the terminal's SIGHUP cascade. Bash's disown removes the job from bash's job table -- bash will not send SIGHUP to it on exit, but the kernel might if the process group becomes orphaned.
Common Questions
How does tmux keep processes alive after terminal disconnect?
tmux's server process calls setsid() to create its own session. Each tmux window runs in a separate pseudo-terminal (pty) owned by the tmux server. On disconnect, only the client's terminal hangs up. The tmux server and all its child sessions are in different sessions -- they are untouched. Reattaching creates a new client that connects to the existing server via a Unix domain socket.
Why does Ctrl+C kill all processes in a pipeline, not just one?
The terminal driver sends SIGINT to the entire foreground process group, not to a single process. All processes in a pipeline share the same PGID because the shell set it up that way. So yes | head -- pressing Ctrl+C causes both yes and head to receive SIGINT simultaneously. (In practice, head usually exits first when it has enough lines, causing yes to get SIGPIPE on its next write.)
What happens if a session leader opens a terminal device after setsid()?
If the session leader has no controlling terminal and opens a terminal device that is not already controlling another session, that terminal becomes the controlling terminal for the session (unless O_NOCTTY is used). The session leader can also use ioctl(TIOCSCTTY) to forcibly acquire a controlling terminal. login and sshd rely on this mechanism to set up the user's terminal session.
How does systemd handle process groups differently from SysV init?
systemd places each service in its own cgroup, which provides reliable process tracking regardless of process group or session tricks. A daemon that double-forks, calls setsid(), or spawns children in new groups cannot escape its cgroup. That is why KillMode=control-group (systemd's default) reliably kills all service processes on stop. SysV init relied on PID files and process groups, which are trivially bypassed.
How Technologies Use This
Running systemctl stop on a service reports it as stopped. Days later, rogue processes are still running -- holding ports, leaking memory, and corrupting log files. The service had spawned grandchildren that called setsid() or setpgid(), escaping the original process group entirely.
SysV init used kill(-pgid) to stop services, which only reaches processes sharing the original process group. Any child that creates a new session or moves to a new group becomes invisible. On a server with 50+ services, these escapees slowly accumulate over days of operation, consuming memory and file descriptors.
systemd tracks services via cgroups instead of process groups. Every fork, double-fork, and setsid() still lands inside the same cgroup. When KillMode=control-group fires, SIGTERM reaches 100% of processes belonging to that service, regardless of their PGID or SID. No process escapes.
The Nginx master process manages 16 workers and receives SIGQUIT for graceful shutdown. If each worker had its own process group, the operator would need to track and signal each one individually -- error-prone and racy during config reloads when workers are constantly starting and stopping.
Nginx keeps all workers in the master's process group. A single kill(-pgid, SIGQUIT) reaches every worker simultaneously. Workers finish in-flight requests (which may take 30+ seconds for large file downloads) and exit cleanly. No tracking, no iteration, no race conditions.
For config reload, SIGHUP to the master PID forks new workers with the updated nginx.conf while old workers drain gracefully. This process-group design means Nginx manages its entire worker lifecycle with just two signals instead of complex IPC -- one signal to shut down, one signal to reload.
Same Concept Across Tech
| Concept | Docker | JVM | Node.js | Go | K8s |
|---|---|---|---|---|---|
| Process grouping | PID 1 in container is session leader; tini forwards signals to child group | JVM is single process group; Runtime.addShutdownHook catches SIGTERM | child_process.spawn({ detached: true }) creates new group | exec.CommandGroup sets Setpgid in SysProcAttr | Pod terminationGracePeriodSeconds sends SIGTERM to PID 1 group |
| Session management | Each container gets its own PID namespace + session | N/A -- JVM does not call setsid() | N/A -- Node does not manage sessions directly | os/exec can set Setsid: true for daemon children | Each container init is session leader within its PID namespace |
| Signal routing | docker stop sends SIGTERM to PID 1 only; tini propagates to group | kill -TERM $JVM_PID; shutdown hooks run | process.on('SIGTERM') handler; cluster workers need explicit forwarding | signal.Notify(ch, syscall.SIGTERM) per goroutine | preStop hook runs before SIGTERM delivery |
| Orphan handling | Zombie reaping requires PID 1 to wait(); tini or --init handles this | N/A -- JVM threads are not child processes | cluster.on('exit') must re-fork dead workers | cmd.Wait() must be called for every exec.Command | restartPolicy handles pod-level restarts, not process orphans |
| Stack Layer | Mechanism |
|---|---|
| Application | Calls setpgid()/setsid() to control group and session membership |
| Shell | Creates process groups per pipeline, manages foreground via tcsetpgrp() |
| Terminal driver | Routes SIGINT/SIGTSTP/SIGQUIT to foreground process group, SIGHUP on hangup |
| Kernel scheduler | Tracks pgid and sid in task_struct; delivers group-wide signals via kill(-pgid) |
| Init system | systemd uses cgroups (not process groups) for escape-proof service tracking |
Design rationale: The two-level hierarchy solves two different problems. Process groups let Ctrl+C target an entire pipeline at once. Sessions tie a collection of groups to a terminal's lifecycle so that hangup cascades to everything in that login. setsid() exists as the clean escape for long-lived daemons that must outlive the terminal that started them.
If You See This, Think This
| Symptom | Likely Cause | First Check |
|---|---|---|
| Ctrl+C only kills one process in a pipeline | Pipeline members not in the same process group | ps -eo pid,pgid,comm for the pipeline PIDs |
| Background job dies on SSH disconnect | Process still in the terminal's session; SIGHUP cascade reaches it | ps -o sid= -p $PID -- compare with login shell SID |
| nohup process still dies on logout | SIGHUP ignored but process stopped by SIGTTIN when reading terminal | Check process state with ps -o stat= -p $PID |
| systemctl stop leaves orphan processes | Children called setsid() escaping the process group kill | Check KillMode in unit file; use control-group mode |
| Background process frozen, cannot fg it | Orphaned process group -- no session member can resume it | ps -o pgid,stat for the process; look for T (stopped) state |
| kill -TERM -$PGID returns "no such process" | Process group leader exited; PGID is invalid but members may still exist | ps -eo pid,pgid,comm to find remaining members by PGID value |
When to Use / Avoid
- Use when building a shell or terminal multiplexer that needs to manage pipeline signal delivery
- Use when daemonizing a process -- setsid() is step one to detach from the controlling terminal
- Use when implementing graceful shutdown of multi-worker services via kill(-pgid, SIGTERM)
- Use when debugging why background jobs die on SSH disconnect or terminal close
- Avoid when cgroup-based tracking is available (systemd) -- cgroups are escape-proof, process groups are not
- Avoid when single-process services do not need group signal delivery
Try It Yourself
1 # Show PID, PPID, PGID, SID, and terminal for all processes
2
3 ps -eo pid,ppid,pgid,sid,tty,stat,comm | head -30
4
5 # Show the full process tree with session info
6
7 ps axjf | head -40
8
9 # Find the session leader for a process
10
11 ps -o sid= -p $$ | xargs -I{} ps -eo pid,sid,comm | awk -v sid={} '$2==sid && $1==sid'
12
13 # Check which process group is the foreground group of a terminal
14
15 cat /proc/$$/stat | awk '{print "Foreground PGID: " $8}'
16
17 # Create a new session (useful for daemon testing)
18
19 setsid bash -c 'echo New session SID=$(cat /proc/$$/sessionid 2>/dev/null || ps -o sid= -p $$); sleep 30' &
20
21 # Send a signal to an entire process group
22
23 kill -TERM -$(ps -o pgid= -p $$ | tr -d ' ')Debug Checklist
- 1
ps -eo pid,ppid,pgid,sid,tty,stat,comm | head -40 -- map every process to its group and session - 2
ps axjf -- tree view showing PPID, PID, PGID, SID boundaries - 3
cat /proc/$PID/stat | awk '{print "PGID:"$5, "SID:"$6, "tty:"$7, "fgpgid:"$8}' -- raw kernel view of group membership - 4
strace -e trace=setpgid,setsid,tcsetpgrp -p $SHELL_PID -- watch how the shell manages groups in real time - 5
kill -0 -$PGID 2>/dev/null && echo 'group alive' || echo 'group gone' -- test if a process group still exists
Key Takeaways
- ✓When a shell creates a pipeline (cmd1 | cmd2 | cmd3), all three processes go into the same process group. Ctrl+C sends SIGINT to the entire foreground group. That is why all three die at once, not one at a time.
- ✓setsid() is the escape hatch. It creates a new session AND a new process group, with no controlling terminal. This is step one of daemonization, and it is why tmux sessions survive terminal disconnect.
- ✓When a terminal hangs up (SSH disconnect, window closed), the kernel sends SIGHUP to the session leader. The shell then cascades SIGHUP to all its job process groups. That is why background jobs die when you log out -- unless they are in a different session.
- ✓Background processes that try to read from the terminal get stopped with SIGTTIN. This prevents background jobs from stealing terminal input. Similarly, SIGTTOU stops background writers if the terminal's tostop flag is set.
- ✓Orphaned process groups -- where no member has a parent in a different group within the same session -- get SIGHUP + SIGCONT if any member is stopped. This prevents stopped processes from being stuck forever when the shell exits.
Common Pitfalls
- ✗Thinking nohup makes a process a daemon. Reality: nohup only ignores SIGHUP and redirects output. The process still shares the session and may receive other signals. For a real daemon, use setsid() + double-fork or systemd.
- ✗Not calling setpgid() in both the parent (shell) and child after fork(). There is a race window: if the shell signals the process group before the child has set its own PGID, the signal goes to the wrong group. bash calls setpgid() from both sides to eliminate this race.
- ✗Expecting all processes to die when you close the terminal. Processes that have called setsid() or been reparented will not receive SIGHUP. tmux and screen work precisely by creating new sessions for their children.
- ✗Confusing process group leader with session leader. The process group leader is the first process in a pipeline (PGID == its PID). The session leader is the login shell (SID == its PID). They serve different roles.
Reference
In One Line
Same group = one signal kills the whole pipeline; setsid() = the terminal's hangup can never reach it.