Processes, Threads & SchedulingTopic 6 of 14

Processes & ThreadsIntermediate

Process Groups, Sessions & Job Control

systemdNginx

🧠

Mental Model

A school floor with classrooms and a PA system. Each classroom is a process group -- students working on the same project. The PA is the controlling terminal. When the principal makes an announcement (Ctrl+C), only the classroom currently presenting (foreground group) hears it. End of the school day (terminal hangup) -- the floor manager (session leader) tells every classroom to pack up. But a classroom that relocated to a separate building across the street (setsid) never hears the PA at all.

💡

The Problem

systemctl stop reports a service as stopped, but grandchildren that called setsid() escaped the process group kill. Over days, 50+ rogue workers accumulate, leaking 12 GB of memory and holding 3,000 file descriptors. An SSH disconnect sends SIGHUP to the session leader and kills a critical background pipeline mid-run -- 6 hours of processed output, corrupted. Three processes in a pipeline each landed in a separate group, so Ctrl+C only reaches one; the other two spin at 100% CPU eating pipe buffer memory.

Architecture

Type grep foo file | sort | head and hit Ctrl+C. All three processes die at the same time.

Stop and think about that. No signal was sent to each process individually. Two keys were pressed, and the kernel figured out that those three processes belong together and killed them as a unit. How?

Then the SSH session closes. The background make job dies. But the process started under tmux keeps running. Same logout, different outcomes. What determines which processes survive?

The answer to both questions is the same: process groups and sessions. They are the invisible layer between the terminal and the processes, and once they become visible, a dozen things that seemed like magic suddenly make sense.

What Actually Happens

The Unix process hierarchy has three levels: processes, process groups, and sessions.

Process groups. When a shell creates a pipeline, it puts all processes in that pipeline into the same process group. The shell calls setpgid() to assign the first process's PID as the PGID for all of them. Hitting Ctrl+C causes the terminal driver to send SIGINT to every process with that PGID. One keystroke, all processes in the pipeline.

Sessions. A session is a collection of process groups. At login, the login process calls setsid() to create a new session. The login shell becomes the session leader (its PID equals its SID). Each command or pipeline launched gets its own process group within this session.

The controlling terminal. Each session can have one controlling terminal -- the tty associated with the login. The terminal driver uses this to route keyboard signals. The shell uses tcsetpgrp() to designate one process group as the foreground group. Only the foreground group receives Ctrl+C, Ctrl+Z, and Ctrl+\.

The hangup cascade. When the controlling terminal hangs up (SSH disconnects, terminal window closes), the kernel sends SIGHUP to the session leader. The shell catches it and forwards SIGHUP to every process group it manages. Then the shell exits. This cascade is why background jobs die on logout -- they are still in the same session.

Processes started under tmux or screen live in different sessions. The hangup from the terminal's session cannot reach them.

Under the Hood

Foreground vs background groups. Only the foreground process group can read from and write to the terminal. If a background process tries to read() from the terminal, the kernel sends it SIGTTIN (default action: stop). This prevents background jobs from stealing terminal input. If the terminal's tostop flag is set, background writers get SIGTTOU too. The fg command calls tcsetpgrp() to promote a background group to foreground.

The setpgid() race. After fork(), there is a race. The shell wants to set the child's PGID, and the child wants to set its own PGID. If the shell sends a signal to the process group before both sides have completed their setpgid() calls, things go wrong. Bash and zsh solve this by calling setpgid() from both parent and child. If one call fails with EACCES (child already exec'd), that is fine -- the other side already handled it.

Orphaned process groups. When a process group becomes orphaned (no member has a parent in a different group within the same session), the kernel checks if any member is stopped. If so, it sends SIGHUP followed by SIGCONT to the entire group. The reasoning: stopped processes in an orphaned group can never be resumed by job control (the shell is gone), so SIGHUP gives them a chance to clean up, and SIGCONT ensures they actually wake up to handle it.

nohup vs setsid vs disown. These all protect processes from terminal hangup, but in completely different ways. nohup ignores SIGHUP and redirects output -- the process stays in the same session. setsid creates a new session -- the process is unreachable by the terminal's SIGHUP cascade. Bash's disown removes the job from bash's job table -- bash will not send SIGHUP to it on exit, but the kernel might if the process group becomes orphaned.

Common Questions

How does tmux keep processes alive after terminal disconnect?

tmux's server process calls setsid() to create its own session. Each tmux window runs in a separate pseudo-terminal (pty) owned by the tmux server. On disconnect, only the client's terminal hangs up. The tmux server and all its child sessions are in different sessions -- they are untouched. Reattaching creates a new client that connects to the existing server via a Unix domain socket.

Why does Ctrl+C kill all processes in a pipeline, not just one?

The terminal driver sends SIGINT to the entire foreground process group, not to a single process. All processes in a pipeline share the same PGID because the shell set it up that way. So yes | head -- pressing Ctrl+C causes both yes and head to receive SIGINT simultaneously. (In practice, head usually exits first when it has enough lines, causing yes to get SIGPIPE on its next write.)

What happens if a session leader opens a terminal device after setsid()?

If the session leader has no controlling terminal and opens a terminal device that is not already controlling another session, that terminal becomes the controlling terminal for the session (unless O_NOCTTY is used). The session leader can also use ioctl(TIOCSCTTY) to forcibly acquire a controlling terminal. login and sshd rely on this mechanism to set up the user's terminal session.

How does systemd handle process groups differently from SysV init?

systemd places each service in its own cgroup, which provides reliable process tracking regardless of process group or session tricks. A daemon that double-forks, calls setsid(), or spawns children in new groups cannot escape its cgroup. That is why KillMode=control-group (systemd's default) reliably kills all service processes on stop. SysV init relied on PID files and process groups, which are trivially bypassed.

How Technologies Use This

systemd

Running systemctl stop on a service reports it as stopped. Days later, rogue processes are still running -- holding ports, leaking memory, and corrupting log files. The service had spawned grandchildren that called setsid() or setpgid(), escaping the original process group entirely.

SysV init used kill(-pgid) to stop services, which only reaches processes sharing the original process group. Any child that creates a new session or moves to a new group becomes invisible. On a server with 50+ services, these escapees slowly accumulate over days of operation, consuming memory and file descriptors.

systemd tracks services via cgroups instead of process groups. Every fork, double-fork, and setsid() still lands inside the same cgroup. When KillMode=control-group fires, SIGTERM reaches 100% of processes belonging to that service, regardless of their PGID or SID. No process escapes.

Nginx

The Nginx master process manages 16 workers and receives SIGQUIT for graceful shutdown. If each worker had its own process group, the operator would need to track and signal each one individually -- error-prone and racy during config reloads when workers are constantly starting and stopping.

Nginx keeps all workers in the master's process group. A single kill(-pgid, SIGQUIT) reaches every worker simultaneously. Workers finish in-flight requests (which may take 30+ seconds for large file downloads) and exit cleanly. No tracking, no iteration, no race conditions.

For config reload, SIGHUP to the master PID forks new workers with the updated nginx.conf while old workers drain gracefully. This process-group design means Nginx manages its entire worker lifecycle with just two signals instead of complex IPC -- one signal to shut down, one signal to reload.

Same Concept Across Tech

Concept	Docker	JVM	Node.js	Go	K8s
Process grouping	PID 1 in container is session leader; tini forwards signals to child group	JVM is single process group; Runtime.addShutdownHook catches SIGTERM	child_process.spawn({ detached: true }) creates new group	exec.CommandGroup sets Setpgid in SysProcAttr	Pod terminationGracePeriodSeconds sends SIGTERM to PID 1 group
Session management	Each container gets its own PID namespace + session	N/A -- JVM does not call setsid()	N/A -- Node does not manage sessions directly	os/exec can set Setsid: true for daemon children	Each container init is session leader within its PID namespace
Signal routing	docker stop sends SIGTERM to PID 1 only; tini propagates to group	kill -TERM $JVM_PID; shutdown hooks run	process.on('SIGTERM') handler; cluster workers need explicit forwarding	signal.Notify(ch, syscall.SIGTERM) per goroutine	preStop hook runs before SIGTERM delivery
Orphan handling	Zombie reaping requires PID 1 to wait(); tini or --init handles this	N/A -- JVM threads are not child processes	cluster.on('exit') must re-fork dead workers	cmd.Wait() must be called for every exec.Command	restartPolicy handles pod-level restarts, not process orphans

Stack Layer	Mechanism
Application	Calls setpgid()/setsid() to control group and session membership
Shell	Creates process groups per pipeline, manages foreground via tcsetpgrp()
Terminal driver	Routes SIGINT/SIGTSTP/SIGQUIT to foreground process group, SIGHUP on hangup
Kernel scheduler	Tracks pgid and sid in task_struct; delivers group-wide signals via kill(-pgid)
Init system	systemd uses cgroups (not process groups) for escape-proof service tracking

Design rationale: The two-level hierarchy solves two different problems. Process groups let Ctrl+C target an entire pipeline at once. Sessions tie a collection of groups to a terminal's lifecycle so that hangup cascades to everything in that login. setsid() exists as the clean escape for long-lived daemons that must outlive the terminal that started them.

If You See This, Think This

Symptom	Likely Cause	First Check
Ctrl+C only kills one process in a pipeline	Pipeline members not in the same process group	ps -eo pid,pgid,comm for the pipeline PIDs
Background job dies on SSH disconnect	Process still in the terminal's session; SIGHUP cascade reaches it	ps -o sid= -p $PID -- compare with login shell SID
nohup process still dies on logout	SIGHUP ignored but process stopped by SIGTTIN when reading terminal	Check process state with ps -o stat= -p $PID
systemctl stop leaves orphan processes	Children called setsid() escaping the process group kill	Check KillMode in unit file; use control-group mode
Background process frozen, cannot fg it	Orphaned process group -- no session member can resume it	ps -o pgid,stat for the process; look for T (stopped) state
kill -TERM -$PGID returns "no such process"	Process group leader exited; PGID is invalid but members may still exist	ps -eo pid,pgid,comm to find remaining members by PGID value

When to Use / Avoid

Use when building a shell or terminal multiplexer that needs to manage pipeline signal delivery
Use when daemonizing a process -- setsid() is step one to detach from the controlling terminal
Use when implementing graceful shutdown of multi-worker services via kill(-pgid, SIGTERM)
Use when debugging why background jobs die on SSH disconnect or terminal close
Avoid when cgroup-based tracking is available (systemd) -- cgroups are escape-proof, process groups are not
Avoid when single-process services do not need group signal delivery

Try It Yourself

 1  # Show PID, PPID, PGID, SID, and terminal for all processes
 2  
 3  ps -eo pid,ppid,pgid,sid,tty,stat,comm | head -30
 4  
 5  # Show the full process tree with session info
 6  
 7  ps axjf | head -40
 8  
 9  # Find the session leader for a process
10  
11  ps -o sid= -p $$ | xargs -I{} ps -eo pid,sid,comm | awk -v sid={} '$2==sid && $1==sid'
12  
13  # Check which process group is the foreground group of a terminal
14  
15  cat /proc/$$/stat | awk '{print "Foreground PGID: " $8}'
16  
17  # Create a new session (useful for daemon testing)
18  
19  setsid bash -c 'echo New session SID=$(cat /proc/$$/sessionid 2>/dev/null || ps -o sid= -p $$); sleep 30' &
20  
21  # Send a signal to an entire process group
22  
23  kill -TERM -$(ps -o pgid= -p $$ | tr -d ' ')

Debug Checklist

1ps -eo pid,ppid,pgid,sid,tty,stat,comm | head -40 -- map every process to its group and session
2ps axjf -- tree view showing PPID, PID, PGID, SID boundaries
3cat /proc/$PID/stat | awk '{print "PGID:"$5, "SID:"$6, "tty:"$7, "fgpgid:"$8}' -- raw kernel view of group membership
4strace -e trace=setpgid,setsid,tcsetpgrp -p $SHELL_PID -- watch how the shell manages groups in real time
5kill -0 -$PGID 2>/dev/null && echo 'group alive' || echo 'group gone' -- test if a process group still exists

Key Takeaways

✓When a shell creates a pipeline (cmd1 | cmd2 | cmd3), all three processes go into the same process group. Ctrl+C sends SIGINT to the entire foreground group. That is why all three die at once, not one at a time.
✓setsid() is the escape hatch. It creates a new session AND a new process group, with no controlling terminal. This is step one of daemonization, and it is why tmux sessions survive terminal disconnect.
✓When a terminal hangs up (SSH disconnect, window closed), the kernel sends SIGHUP to the session leader. The shell then cascades SIGHUP to all its job process groups. That is why background jobs die when you log out -- unless they are in a different session.
✓Background processes that try to read from the terminal get stopped with SIGTTIN. This prevents background jobs from stealing terminal input. Similarly, SIGTTOU stops background writers if the terminal's tostop flag is set.
✓Orphaned process groups -- where no member has a parent in a different group within the same session -- get SIGHUP + SIGCONT if any member is stopped. This prevents stopped processes from being stuck forever when the shell exits.

Common Pitfalls

✗Thinking nohup makes a process a daemon. Reality: nohup only ignores SIGHUP and redirects output. The process still shares the session and may receive other signals. For a real daemon, use setsid() + double-fork or systemd.
✗Not calling setpgid() in both the parent (shell) and child after fork(). There is a race window: if the shell signals the process group before the child has set its own PGID, the signal goes to the wrong group. bash calls setpgid() from both sides to eliminate this race.
✗Expecting all processes to die when you close the terminal. Processes that have called setsid() or been reparented will not receive SIGHUP. tmux and screen work precisely by creating new sessions for their children.
✗Confusing process group leader with session leader. The process group leader is the first process in a pipeline (PGID == its PID). The session leader is the login shell (SID == its PID). They serve different roles.

Reference

System Calls

setpgidsetsidtcsetpgrpgetpgrptcgetpgrp

Tools

ps -eo pid,ppid,pgid,sid,tty,stat,commps axjfstrace -e trace=setpgid,setsid,tcsetpgrp bash

📌

In One Line

Same group = one signal kills the whole pipeline; setsid() = the terminal's hangup can never reach it.

Process Groups, Sessions & Job Control

systemdNginx

🧠

Mental Model

💡

The Problem

Architecture

Type grep foo file | sort | head and hit Ctrl+C. All three processes die at the same time.

Then the SSH session closes. The background make job dies. But the process started under tmux keeps running. Same logout, different outcomes. What determines which processes survive?

What Actually Happens

The Unix process hierarchy has three levels: processes, process groups, and sessions.

Processes started under tmux or screen live in different sessions. The hangup from the terminal's session cannot reach them.

Under the Hood

Common Questions

How does tmux keep processes alive after terminal disconnect?

Why does Ctrl+C kill all processes in a pipeline, not just one?

What happens if a session leader opens a terminal device after setsid()?

How does systemd handle process groups differently from SysV init?

How Technologies Use This

systemd

Nginx

Same Concept Across Tech

Concept	Docker	JVM	Node.js	Go	K8s
Process grouping	PID 1 in container is session leader; tini forwards signals to child group	JVM is single process group; Runtime.addShutdownHook catches SIGTERM	child_process.spawn({ detached: true }) creates new group	exec.CommandGroup sets Setpgid in SysProcAttr	Pod terminationGracePeriodSeconds sends SIGTERM to PID 1 group
Session management	Each container gets its own PID namespace + session	N/A -- JVM does not call setsid()	N/A -- Node does not manage sessions directly	os/exec can set Setsid: true for daemon children	Each container init is session leader within its PID namespace
Signal routing	docker stop sends SIGTERM to PID 1 only; tini propagates to group	kill -TERM $JVM_PID; shutdown hooks run	process.on('SIGTERM') handler; cluster workers need explicit forwarding	signal.Notify(ch, syscall.SIGTERM) per goroutine	preStop hook runs before SIGTERM delivery
Orphan handling	Zombie reaping requires PID 1 to wait(); tini or --init handles this	N/A -- JVM threads are not child processes	cluster.on('exit') must re-fork dead workers	cmd.Wait() must be called for every exec.Command	restartPolicy handles pod-level restarts, not process orphans

Stack Layer	Mechanism
Application	Calls setpgid()/setsid() to control group and session membership
Shell	Creates process groups per pipeline, manages foreground via tcsetpgrp()
Terminal driver	Routes SIGINT/SIGTSTP/SIGQUIT to foreground process group, SIGHUP on hangup
Kernel scheduler	Tracks pgid and sid in task_struct; delivers group-wide signals via kill(-pgid)
Init system	systemd uses cgroups (not process groups) for escape-proof service tracking

If You See This, Think This

Symptom	Likely Cause	First Check
Ctrl+C only kills one process in a pipeline	Pipeline members not in the same process group	ps -eo pid,pgid,comm for the pipeline PIDs
Background job dies on SSH disconnect	Process still in the terminal's session; SIGHUP cascade reaches it	ps -o sid= -p $PID -- compare with login shell SID
nohup process still dies on logout	SIGHUP ignored but process stopped by SIGTTIN when reading terminal	Check process state with ps -o stat= -p $PID
systemctl stop leaves orphan processes	Children called setsid() escaping the process group kill	Check KillMode in unit file; use control-group mode
Background process frozen, cannot fg it	Orphaned process group -- no session member can resume it	ps -o pgid,stat for the process; look for T (stopped) state
kill -TERM -$PGID returns "no such process"	Process group leader exited; PGID is invalid but members may still exist	ps -eo pid,pgid,comm to find remaining members by PGID value

When to Use / Avoid

Use when building a shell or terminal multiplexer that needs to manage pipeline signal delivery
Use when daemonizing a process -- setsid() is step one to detach from the controlling terminal
Use when implementing graceful shutdown of multi-worker services via kill(-pgid, SIGTERM)
Use when debugging why background jobs die on SSH disconnect or terminal close
Avoid when cgroup-based tracking is available (systemd) -- cgroups are escape-proof, process groups are not
Avoid when single-process services do not need group signal delivery

Try It Yourself

 1  # Show PID, PPID, PGID, SID, and terminal for all processes
 2  
 3  ps -eo pid,ppid,pgid,sid,tty,stat,comm | head -30
 4  
 5  # Show the full process tree with session info
 6  
 7  ps axjf | head -40
 8  
 9  # Find the session leader for a process
10  
11  ps -o sid= -p $$ | xargs -I{} ps -eo pid,sid,comm | awk -v sid={} '$2==sid && $1==sid'
12  
13  # Check which process group is the foreground group of a terminal
14  
15  cat /proc/$$/stat | awk '{print "Foreground PGID: " $8}'
16  
17  # Create a new session (useful for daemon testing)
18  
19  setsid bash -c 'echo New session SID=$(cat /proc/$$/sessionid 2>/dev/null || ps -o sid= -p $$); sleep 30' &
20  
21  # Send a signal to an entire process group
22  
23  kill -TERM -$(ps -o pgid= -p $$ | tr -d ' ')

Debug Checklist

1ps -eo pid,ppid,pgid,sid,tty,stat,comm | head -40 -- map every process to its group and session
2ps axjf -- tree view showing PPID, PID, PGID, SID boundaries
3cat /proc/$PID/stat | awk '{print "PGID:"$5, "SID:"$6, "tty:"$7, "fgpgid:"$8}' -- raw kernel view of group membership
4strace -e trace=setpgid,setsid,tcsetpgrp -p $SHELL_PID -- watch how the shell manages groups in real time
5kill -0 -$PGID 2>/dev/null && echo 'group alive' || echo 'group gone' -- test if a process group still exists

Key Takeaways

✓When a shell creates a pipeline (cmd1 | cmd2 | cmd3), all three processes go into the same process group. Ctrl+C sends SIGINT to the entire foreground group. That is why all three die at once, not one at a time.
✓setsid() is the escape hatch. It creates a new session AND a new process group, with no controlling terminal. This is step one of daemonization, and it is why tmux sessions survive terminal disconnect.
✓When a terminal hangs up (SSH disconnect, window closed), the kernel sends SIGHUP to the session leader. The shell then cascades SIGHUP to all its job process groups. That is why background jobs die when you log out -- unless they are in a different session.
✓Background processes that try to read from the terminal get stopped with SIGTTIN. This prevents background jobs from stealing terminal input. Similarly, SIGTTOU stops background writers if the terminal's tostop flag is set.
✓Orphaned process groups -- where no member has a parent in a different group within the same session -- get SIGHUP + SIGCONT if any member is stopped. This prevents stopped processes from being stuck forever when the shell exits.

Common Pitfalls

✗Thinking nohup makes a process a daemon. Reality: nohup only ignores SIGHUP and redirects output. The process still shares the session and may receive other signals. For a real daemon, use setsid() + double-fork or systemd.
✗Not calling setpgid() in both the parent (shell) and child after fork(). There is a race window: if the shell signals the process group before the child has set its own PGID, the signal goes to the wrong group. bash calls setpgid() from both sides to eliminate this race.
✗Expecting all processes to die when you close the terminal. Processes that have called setsid() or been reparented will not receive SIGHUP. tmux and screen work precisely by creating new sessions for their children.
✗Confusing process group leader with session leader. The process group leader is the first process in a pipeline (PGID == its PID). The session leader is the login shell (SID == its PID). They serve different roles.

Reference

System Calls

setpgidsetsidtcsetpgrpgetpgrptcgetpgrp

Tools

ps -eo pid,ppid,pgid,sid,tty,stat,commps axjfstrace -e trace=setpgid,setsid,tcsetpgrp bash

📌

In One Line

Same group = one signal kills the whole pipeline; setsid() = the terminal's hangup can never reach it.

Process Groups, Sessions & Job Control

Mental Model

The Problem

Architecture

What Actually Happens

Under the Hood

Common Questions

How Technologies Use This

Same Concept Across Tech

If You See This, Think This

When to Use / Avoid

Try It Yourself

Debug Checklist

Key Takeaways

Common Pitfalls

Reference

In One Line

Related Topics

Process Groups, Sessions & Job Control

Mental Model

The Problem

Architecture

What Actually Happens

Under the Hood

Common Questions

How Technologies Use This

Same Concept Across Tech

If You See This, Think This

When to Use / Avoid

Try It Yourself

Debug Checklist

Key Takeaways

Common Pitfalls

Reference

In One Line

Related Topics