Processes, Threads & SchedulingTopic 12 of 14

Processes & ThreadsAdvanced

ptrace: Process Tracing & Debugging

DockerChrome

🧠

Mental Model

A puppet theater. The puppet moves on stage performing its routine. Behind the curtain, a puppeteer can freeze it mid-motion, inspect exactly how the strings are positioned, adjust them to alter the next move, plant a tripwire on stage that halts the puppet when stepped on, or demand the puppet pause and report every time it walks through a doorway. One puppeteer per puppet -- no sharing.

💡

The Problem

A production service stops responding. No logs, no errors, no crash dump -- engineers spend 4+ hours restarting and guessing without ever seeing the process's live state. In a different scenario, a compromised container ptrace-attaches to a neighbor and extracts database credentials from memory in 200 ms, leaving zero application-level trace. And strace on a high-throughput service (50,000 syscalls/sec) adds two context switches per syscall, cratering throughput by 80-100x -- naive tracing in production is not an option.

Architecture

A production service just stopped responding. No logs. No error messages. Just silence.

Someone needs to look inside a running process's brain. What syscall is it stuck on? What file is it trying to open? What arguments is it passing?

That is what ptrace does. And every time strace, GDB, or any debugger has been used on Linux, ptrace was doing the actual work.

What Actually Happens

ptrace establishes a tracer-tracee relationship between two processes. The typical flow:

For launching a new program (GDB run): fork() a child. The child calls ptrace(PTRACE_TRACEME), then execve(). The kernel stops the child at the first instruction of the new program and notifies the parent via waitpid(). GDB now has full control.

For attaching to a running process (strace -p PID): The tracer calls ptrace(PTRACE_ATTACH, PID) or PTRACE_SEIZE. The kernel sends SIGSTOP to the tracee (for ATTACH) or simply marks it as traced (for SEIZE). The tracee enters TASK_TRACED state, and the tracer can inspect everything.

For syscall tracing (strace): PTRACE_SYSCALL resumes the tracee and stops it at the next syscall boundary. Each syscall produces two stops: at entry (read syscall number from orig_rax, arguments from rdi/rsi/rdx/r10/r8/r9) and at exit (read return value from rax). This two-stop pattern is why strace can show both arguments and results -- and why it adds 10-100x overhead.

For breakpoints (GDB): GDB uses PTRACE_POKETEXT to overwrite the first byte of a target instruction with 0xCC (x86 INT3). When the CPU hits INT3, it generates SIGTRAP, stopping the tracee. GDB reads registers, restores the original byte, single-steps past it, re-inserts the breakpoint, and continues.

Under the Hood

Signal injection and suppression. When a tracee stops due to a signal, the tracer can suppress it (pass 0 to PTRACE_CONT) or inject a different one. GDB suppresses SIGTRAP from breakpoints because they were set intentionally. It can also intercept SIGSEGV, examine the fault, and decide whether to let the signal through.

PTRACE_O_TRACESYSGOOD solves ambiguity. By default, both signal stops and syscall stops deliver SIGTRAP, making them indistinguishable. Setting this option causes syscall stops to set bit 7 (delivering SIGTRAP | 0x80), so the tracer reliably knows "this stop is a syscall" vs "this is a real signal."

/proc/PID/mem is faster than PTRACE_PEEKDATA. Modern tracers read memory via /proc/PID/mem with a single pread() call for arbitrary sizes. PTRACE_PEEKDATA reads one word (8 bytes) per syscall. For a 4KB page, that is 1 syscall vs 512.

Anti-debugging via self-tracing. Since only one tracer can attach to a process, calling ptrace(PTRACE_TRACEME) early blocks later debugger attachments with EPERM. Common in commercial software and malware. Bypassed with LD_PRELOAD or CAP_SYS_PTRACE.

Common Questions

How does strace work internally?

It forks a child (or PTRACE_ATTACHes to a running process), sets PTRACE_O_TRACESYSGOOD, and loops: PTRACE_SYSCALL to resume, waitpid() to catch the next stop, PTRACE_GETREGS to read registers. At entry, decode orig_rax and arguments. At exit, read rax for the return value. The -f flag uses PTRACE_O_TRACEFORK to follow children.

Why is strace so slow?

Two context switches per syscall (entry stop + exit stop), plus ptrace syscall overhead for register reads. That is ~10-20 microseconds per traced syscall. Alternatives: perf trace (sampling-based, much lower overhead), eBPF tracing (bpftrace, sysdig -- runs in-kernel without stopping the target), and seccomp with SECCOMP_RET_LOG for audit logging.

How does GDB implement hardware watchpoints?

GDB writes the watched address into debug registers DR0-DR3 and sets type/length bits in DR7 via PTRACE_POKEUSER. The CPU monitors these addresses in hardware -- when accessed, it generates a debug exception delivered as SIGTRAP. Zero runtime overhead, but limited to 4 addresses on x86.

What are the security implications?

A process with ptrace access can read passwords, encryption keys, and API tokens from memory. It can modify code to bypass security checks, inject shellcode, and hijack control flow. That is why Ubuntu defaults to ptrace_scope=1, containers drop CAP_SYS_PTRACE, and Docker's seccomp profile blocks it entirely.

How Technologies Use This

Docker

A compromised container calls ptrace(PTRACE_ATTACH) on another container's process and reads database passwords straight from memory. Without restrictions, any process with the same UID can attach to another and dump its entire address space in milliseconds.

The danger is that ptrace grants complete read/write access to another process's memory and registers. An attacker who can attach to a database process can extract connection strings, encryption keys, and user data without ever touching the filesystem or network. The attack leaves no application-level log trail because it operates entirely through memory inspection.

Docker's default seccomp profile blocks the ptrace syscall entirely, and CAP_SYS_PTRACE is dropped from the container's capability bounding set. Even a root process inside the container cannot trace anything. Need strace or GDB for debugging? It must be explicitly granted with --cap-add SYS_PTRACE, which should never be enabled in production where it would expose every other container's secrets.

Chrome

A compromised renderer tab calls open() on /etc/passwd or connect() to an external server, and the syscalls go straight to the kernel with no chance for the browser process to inspect or reject them. seccomp blocks known-bad syscalls, but cannot make nuanced decisions about syscall arguments.

The gap is that seccomp-bpf filters make binary allow/deny decisions based on syscall numbers and arguments, but cannot perform complex policy checks like verifying that a file path is safe or that a network destination is authorized. Some syscalls need contextual validation that only a privileged supervisor process can provide.

Chrome uses ptrace to let the privileged browser process intercept and inspect renderer syscalls in real time. Combined with seccomp-bpf filtering, this two-layer approach catches anomalous behavior like unexpected file opens. The overhead is about 10-20us per intercepted syscall, but since renderers are restricted to roughly 50 allowed syscalls and most are I/O-related, the performance impact on page rendering stays under 2%.

Same Concept Across Tech

Concept	Docker	JVM	Node.js	Go	K8s
Debugging attach	--cap-add SYS_PTRACE required; default seccomp blocks ptrace	jcmd/jstack use /proc/self, not ptrace; remote debug via JDWP	node --inspect uses V8 debug protocol, not ptrace	Delve debugger uses ptrace on Linux	ephemeral debug containers with SYS_PTRACE capability
Syscall tracing	strace inside container needs SYS_PTRACE cap	strace on JVM PID shows JNI and native syscalls	strace on node PID reveals libuv I/O patterns	strace on Go binary shows raw syscalls (no libc)	kubectl debug with strace image
Security restriction	Default seccomp + dropped CAP_SYS_PTRACE	Yama ptrace_scope limits who can attach	Same Yama restrictions apply	Same Yama restrictions apply	PodSecurityPolicy/PSA blocks SYS_PTRACE in production
Production alternative	eBPF sidecar for tracing	JFR (Java Flight Recorder) for in-process tracing	--perf-basic-prof + perf for sampling	runtime/trace + pprof for Go-native profiling	Pixie, Cilium Hubble for eBPF-based observability

Stack Layer	Mechanism
Application	GDB, LLDB, strace, ltrace -- all frontends to ptrace
Language runtime	Delve (Go), JDWP (Java), V8 Inspector (Node) may bypass ptrace with protocol-level debugging
Kernel ptrace subsystem	Validates CAP_SYS_PTRACE, Yama scope; manages TASK_TRACED state and signal routing
Security modules	Yama LSM enforces ptrace_scope policy; seccomp-bpf can block the ptrace syscall entirely
Hardware	x86 debug registers DR0-DR3 for hardware breakpoints/watchpoints; INT3 (0xCC) for software breakpoints

Design rationale: Putting register access, memory inspection, syscall interception, and breakpoints behind a single syscall keeps the debugging interface simple and universal. The cost -- two context switches per traced syscall and exclusive single-tracer access -- is acceptable for debugging but pushed production tracing toward in-kernel alternatives like eBPF that run without stopping the target.

If You See This, Think This

Symptom	Likely Cause	First Check
strace fails with "Operation not permitted"	Yama ptrace_scope restricts non-parent tracing	cat /proc/sys/kernel/yama/ptrace_scope
Cannot attach debugger to containerized process	CAP_SYS_PTRACE dropped and seccomp blocks ptrace	docker inspect --format '{{.HostConfig.CapAdd}}' $CONTAINER
strace shows process stuck on futex() or epoll_wait()	Process waiting on lock or I/O event -- not a bug, just idle	strace -e trace=futex -c to measure wait frequency
GDB breakpoint causes SIGILL instead of stopping	Breakpoint set on non-executable memory or misaligned address	info breakpoints in GDB; check /proc/$PID/maps for segment permissions
Application 10x slower under strace	Two stops per syscall at high syscall rate	strace -c to count syscalls; switch to perf trace or bpftrace
"already being traced" error on ptrace attach	Another tracer (debugger, strace, security tool) already attached	grep TracerPid /proc/$PID/status

When to Use / Avoid

Use when debugging a hung process with no logs -- strace reveals the blocking syscall
Use when profiling syscall patterns to build a seccomp whitelist for containers
Use when setting breakpoints or inspecting memory in a live process via GDB
Use when tracing library calls (ltrace) to diagnose dynamic linking issues
Avoid in production for continuous monitoring -- 10-100x overhead; use eBPF or perf trace instead
Avoid when multiple tools need simultaneous tracing -- only one tracer can attach at a time

Try It Yourself

 1  # Trace all syscalls of a command with timing info
 2  strace -T -f ls /tmp 2>&1 | head -30
 3  
 4  # Count syscalls by type (summary mode)
 5  strace -c -f curl -s https://example.com 2>&1 | tail -20
 6  
 7  # Trace only file-related syscalls of a running process
 8  strace -e trace=openat,read,write,close -p $$ 2>&1 &
 9  sleep 1; kill %1
10  
11  # Check Yama ptrace_scope setting
12  cat /proc/sys/kernel/yama/ptrace_scope
13  # 0=classic, 1=parent-only (Ubuntu default), 2=admin-only, 3=none
14  
15  # Show which processes are being traced
16  grep -l TracerPid /proc/[0-9]*/status 2>/dev/null | while read f; do
17    pid=$(echo $f | cut -d/ -f3)
18    tracer=$(grep TracerPid $f | awk '{print $2}')
19    [ "$tracer" != "0" ] && echo "PID $pid traced by $tracer"
20  done
21  
22  # Read a process's memory map (useful for ptrace targets)
23  cat /proc/$$/maps | head -10

Debug Checklist

1strace -p $PID -e trace=openat,read,write -T 2>&1 | head -50 -- see what files and I/O a process is doing
2strace -c -f -p $PID -- aggregate syscall counts and time for a running process
3cat /proc/sys/kernel/yama/ptrace_scope -- check ptrace security policy (0-3)
4grep TracerPid /proc/$PID/status -- check if a process is already being traced
5cat /proc/$PID/maps | head -10 -- view memory layout before attaching with ptrace
6strace -e trace=network -f -p $PID 2>&1 | head -30 -- trace network syscalls across all threads

Key Takeaways

✓strace stops the tracee at every syscall entry AND exit -- two stops per syscall. At entry it reads the number from orig_rax and arguments from registers. At exit it reads the return value from rax. This two-stop pattern is why strace slows programs by 10-100x.
✓GDB sets breakpoints by overwriting instruction bytes with 0xCC (INT3). When the CPU hits INT3, it generates SIGTRAP. GDB restores the original byte, single-steps past it, re-inserts the breakpoint, and continues. Hardware breakpoints use debug registers DR0-DR3.
✓Yama LSM restricts ptrace via /proc/sys/kernel/yama/ptrace_scope: 0 = any process can trace any other, 1 = only parent can trace child (Ubuntu default), 2 = only CAP_SYS_PTRACE, 3 = no ptrace at all. This prevents malware from reading secrets out of other processes' memory.
✓PTRACE_SEIZE (Linux 3.4+) is preferred over PTRACE_ATTACH. It does not send SIGSTOP (avoids race conditions), enables PTRACE_EVENT_STOP, and allows PTRACE_INTERRUPT for on-demand stopping.
✓Only one tracer can attach to a process at a time. You cannot strace a process that GDB is already debugging. A process can also self-trace via PTRACE_TRACEME to block later debugger attachments -- a common anti-debugging technique.

Common Pitfalls

✗Mistake: Not calling waitpid() after PTRACE_ATTACH. Reality: The tracee does not stop synchronously. PTRACE_ATTACH sends SIGSTOP, and you must waitpid() for the stop before issuing other ptrace commands. Reading registers before the tracee stops gives stale data.
✗Mistake: Reading the syscall number from rax at syscall exit. Reality: The return value overwrites rax. The original syscall number is in orig_rax (offset 120 in user_regs_struct). Use ORIG_RAX, not RAX.
✗Mistake: Forgetting to handle PTRACE_EVENT_* stops after setting PTRACE_O_TRACEFORK. Reality: Fork events produce a PTRACE_EVENT_FORK stop, not a signal stop. Use PTRACE_GETEVENTMSG to get the child PID and PTRACE_CONT to resume. Missing these events hangs the tracee indefinitely.
✗Mistake: Using ptrace for production monitoring. Reality: ptrace stops the tracee for each operation, adding ~10-20us per syscall. For production, use eBPF or seccomp with SECCOMP_RET_LOG -- they run in-kernel without stopping the target.

Reference

System Calls

ptracewaitpidkill

Tools

straceltraceGDB (ptrace frontend)

📌

In One Line

strace for quick diagnosis, eBPF for production -- ptrace stops the target on every operation, which is fine for debugging but lethal for throughput.

ptrace: Process Tracing & Debugging

DockerChrome

🧠

Mental Model

💡

The Problem

Architecture

A production service just stopped responding. No logs. No error messages. Just silence.

Someone needs to look inside a running process's brain. What syscall is it stuck on? What file is it trying to open? What arguments is it passing?

That is what ptrace does. And every time strace, GDB, or any debugger has been used on Linux, ptrace was doing the actual work.

What Actually Happens

ptrace establishes a tracer-tracee relationship between two processes. The typical flow:

Under the Hood

Common Questions

How does strace work internally?

Why is strace so slow?

How does GDB implement hardware watchpoints?

What are the security implications?

How Technologies Use This

Docker

Chrome

Same Concept Across Tech

Concept	Docker	JVM	Node.js	Go	K8s
Debugging attach	--cap-add SYS_PTRACE required; default seccomp blocks ptrace	jcmd/jstack use /proc/self, not ptrace; remote debug via JDWP	node --inspect uses V8 debug protocol, not ptrace	Delve debugger uses ptrace on Linux	ephemeral debug containers with SYS_PTRACE capability
Syscall tracing	strace inside container needs SYS_PTRACE cap	strace on JVM PID shows JNI and native syscalls	strace on node PID reveals libuv I/O patterns	strace on Go binary shows raw syscalls (no libc)	kubectl debug with strace image
Security restriction	Default seccomp + dropped CAP_SYS_PTRACE	Yama ptrace_scope limits who can attach	Same Yama restrictions apply	Same Yama restrictions apply	PodSecurityPolicy/PSA blocks SYS_PTRACE in production
Production alternative	eBPF sidecar for tracing	JFR (Java Flight Recorder) for in-process tracing	--perf-basic-prof + perf for sampling	runtime/trace + pprof for Go-native profiling	Pixie, Cilium Hubble for eBPF-based observability

Stack Layer	Mechanism
Application	GDB, LLDB, strace, ltrace -- all frontends to ptrace
Language runtime	Delve (Go), JDWP (Java), V8 Inspector (Node) may bypass ptrace with protocol-level debugging
Kernel ptrace subsystem	Validates CAP_SYS_PTRACE, Yama scope; manages TASK_TRACED state and signal routing
Security modules	Yama LSM enforces ptrace_scope policy; seccomp-bpf can block the ptrace syscall entirely
Hardware	x86 debug registers DR0-DR3 for hardware breakpoints/watchpoints; INT3 (0xCC) for software breakpoints

If You See This, Think This

Symptom	Likely Cause	First Check
strace fails with "Operation not permitted"	Yama ptrace_scope restricts non-parent tracing	cat /proc/sys/kernel/yama/ptrace_scope
Cannot attach debugger to containerized process	CAP_SYS_PTRACE dropped and seccomp blocks ptrace	docker inspect --format '{{.HostConfig.CapAdd}}' $CONTAINER
strace shows process stuck on futex() or epoll_wait()	Process waiting on lock or I/O event -- not a bug, just idle	strace -e trace=futex -c to measure wait frequency
GDB breakpoint causes SIGILL instead of stopping	Breakpoint set on non-executable memory or misaligned address	info breakpoints in GDB; check /proc/$PID/maps for segment permissions
Application 10x slower under strace	Two stops per syscall at high syscall rate	strace -c to count syscalls; switch to perf trace or bpftrace
"already being traced" error on ptrace attach	Another tracer (debugger, strace, security tool) already attached	grep TracerPid /proc/$PID/status

When to Use / Avoid

Use when debugging a hung process with no logs -- strace reveals the blocking syscall
Use when profiling syscall patterns to build a seccomp whitelist for containers
Use when setting breakpoints or inspecting memory in a live process via GDB
Use when tracing library calls (ltrace) to diagnose dynamic linking issues
Avoid in production for continuous monitoring -- 10-100x overhead; use eBPF or perf trace instead
Avoid when multiple tools need simultaneous tracing -- only one tracer can attach at a time

Try It Yourself

 1  # Trace all syscalls of a command with timing info
 2  strace -T -f ls /tmp 2>&1 | head -30
 3  
 4  # Count syscalls by type (summary mode)
 5  strace -c -f curl -s https://example.com 2>&1 | tail -20
 6  
 7  # Trace only file-related syscalls of a running process
 8  strace -e trace=openat,read,write,close -p $$ 2>&1 &
 9  sleep 1; kill %1
10  
11  # Check Yama ptrace_scope setting
12  cat /proc/sys/kernel/yama/ptrace_scope
13  # 0=classic, 1=parent-only (Ubuntu default), 2=admin-only, 3=none
14  
15  # Show which processes are being traced
16  grep -l TracerPid /proc/[0-9]*/status 2>/dev/null | while read f; do
17    pid=$(echo $f | cut -d/ -f3)
18    tracer=$(grep TracerPid $f | awk '{print $2}')
19    [ "$tracer" != "0" ] && echo "PID $pid traced by $tracer"
20  done
21  
22  # Read a process's memory map (useful for ptrace targets)
23  cat /proc/$$/maps | head -10

Debug Checklist

1strace -p $PID -e trace=openat,read,write -T 2>&1 | head -50 -- see what files and I/O a process is doing
2strace -c -f -p $PID -- aggregate syscall counts and time for a running process
3cat /proc/sys/kernel/yama/ptrace_scope -- check ptrace security policy (0-3)
4grep TracerPid /proc/$PID/status -- check if a process is already being traced
5cat /proc/$PID/maps | head -10 -- view memory layout before attaching with ptrace
6strace -e trace=network -f -p $PID 2>&1 | head -30 -- trace network syscalls across all threads

Key Takeaways

✓strace stops the tracee at every syscall entry AND exit -- two stops per syscall. At entry it reads the number from orig_rax and arguments from registers. At exit it reads the return value from rax. This two-stop pattern is why strace slows programs by 10-100x.
✓GDB sets breakpoints by overwriting instruction bytes with 0xCC (INT3). When the CPU hits INT3, it generates SIGTRAP. GDB restores the original byte, single-steps past it, re-inserts the breakpoint, and continues. Hardware breakpoints use debug registers DR0-DR3.
✓Yama LSM restricts ptrace via /proc/sys/kernel/yama/ptrace_scope: 0 = any process can trace any other, 1 = only parent can trace child (Ubuntu default), 2 = only CAP_SYS_PTRACE, 3 = no ptrace at all. This prevents malware from reading secrets out of other processes' memory.
✓PTRACE_SEIZE (Linux 3.4+) is preferred over PTRACE_ATTACH. It does not send SIGSTOP (avoids race conditions), enables PTRACE_EVENT_STOP, and allows PTRACE_INTERRUPT for on-demand stopping.
✓Only one tracer can attach to a process at a time. You cannot strace a process that GDB is already debugging. A process can also self-trace via PTRACE_TRACEME to block later debugger attachments -- a common anti-debugging technique.

Common Pitfalls

✗Mistake: Not calling waitpid() after PTRACE_ATTACH. Reality: The tracee does not stop synchronously. PTRACE_ATTACH sends SIGSTOP, and you must waitpid() for the stop before issuing other ptrace commands. Reading registers before the tracee stops gives stale data.
✗Mistake: Reading the syscall number from rax at syscall exit. Reality: The return value overwrites rax. The original syscall number is in orig_rax (offset 120 in user_regs_struct). Use ORIG_RAX, not RAX.
✗Mistake: Forgetting to handle PTRACE_EVENT_* stops after setting PTRACE_O_TRACEFORK. Reality: Fork events produce a PTRACE_EVENT_FORK stop, not a signal stop. Use PTRACE_GETEVENTMSG to get the child PID and PTRACE_CONT to resume. Missing these events hangs the tracee indefinitely.
✗Mistake: Using ptrace for production monitoring. Reality: ptrace stops the tracee for each operation, adding ~10-20us per syscall. For production, use eBPF or seccomp with SECCOMP_RET_LOG -- they run in-kernel without stopping the target.

Reference

System Calls

ptracewaitpidkill

Tools

straceltraceGDB (ptrace frontend)

📌

In One Line

strace for quick diagnosis, eBPF for production -- ptrace stops the target on every operation, which is fine for debugging but lethal for throughput.

ptrace: Process Tracing & Debugging

Mental Model

The Problem

Architecture

What Actually Happens

Under the Hood

Common Questions

How Technologies Use This

Same Concept Across Tech

If You See This, Think This

When to Use / Avoid

Try It Yourself

Debug Checklist

Key Takeaways

Common Pitfalls

Reference

In One Line

Related Topics

ptrace: Process Tracing & Debugging

Mental Model

The Problem

Architecture

What Actually Happens

Under the Hood

Common Questions

How Technologies Use This

Same Concept Across Tech

If You See This, Think This

When to Use / Avoid

Try It Yourself

Debug Checklist

Key Takeaways

Common Pitfalls

Reference

In One Line

Related Topics