Kernel & BootTopic 5 of 13

System InitializationIntermediate

Systemd Internals

DockerKubernetessystemd

🧠

Mental Model

A factory floor with machines (services), power outlets (sockets), shift schedules (timers), and a floor manager (PID 1). The floor manager does not personally operate any machine. The manager reads blueprints (unit files), wires up power outlets before machines are plugged in (socket activation), enforces capacity limits per workstation (cgroups), and records everything in a logbook (journal). Machines can declare "I need machine B running before I start" (After=) and "I cannot function without machine B" (Requires=). The floor manager figures out the fastest parallel startup order that satisfies all constraints. If two machines create a circular "I need the other first" situation, the manager breaks the weaker link and logs the decision.

💡

The Problem

A critical service fails to start after a system update. Running systemctl status reports "Job for payment-gateway.service deleted to break ordering cycle." The dependency graph has a cycle: payment-gateway requires network-online.target, which pulls in NetworkManager-wait-online.service, which has an ordering dependency on dbus.service, which in turn has a Wants= on payment-gateway.service added by a misconfigured drop-in file. Systemd detects the cycle at transaction compilation time, breaks it by deleting the weakest job, and the service never starts. The fix requires identifying and removing the circular dependency, but the error message alone does not reveal which edge created the cycle.

Architecture

A fresh Linux virtual machine finishes its BIOS handoff, loads the kernel, and then PID 1 takes over. In under two seconds, dozens of services are running: networking, logging, D-Bus, SSH, container runtimes, application servers. On a SysVinit system, those same services would start one by one, each shell script waiting for the previous one to finish. The difference is not incremental. It is architectural.

Systemd is not just an init system. It is a service manager, a cgroup manager, a logging daemon, a device manager, a login manager, a timer system, and an IPC bus client, all running as PID 1. Understanding its internals explains why services start in the order they do, why some services restart automatically and others do not, and why "Job deleted to break ordering cycle" is one of the most confusing errors on a modern Linux system.

Unit Types and What They Do

Everything systemd manages is a unit. A unit has a type, a name, and a configuration file.

Service units (.service) are the most common. A service unit describes a process to run: the binary, its arguments, environment variables, and lifecycle (Type=simple, forking, notify, oneshot, dbus, exec, idle). The [Service] section defines ExecStart=, ExecStop=, Restart=, and resource limits.

Socket units (.socket) define a listening socket (TCP, UDP, Unix domain, netlink, or FIFO). The socket is created and bound by systemd, not by the service. When a connection arrives, systemd starts the associated service and hands over the file descriptor. This decouples the socket from the service process.

Timer units (.timer) trigger a service on a schedule. OnCalendar=Mon..Fri 03:00 runs at 3 AM on weekdays. OnBootSec=5min runs five minutes after boot. OnUnitActiveSec=1h runs hourly relative to the last activation. RandomizedDelaySec= staggers execution across a fleet.

Mount units (.mount) manage filesystem mounts. Each mount point has a corresponding unit (auto-generated from /etc/fstab by systemd-fstab-generator, or explicitly defined). Mount units participate in the dependency graph: a service with RequiresMountsFor=/data waits for that mount.

Slice units (.slice) define cgroup subtrees for hierarchical resource management. system.slice contains all system services, user.slice contains per-user sessions. A custom slice like batch.slice can group resource-intensive services under shared limits.

Target units (.target) are grouping units with no process of their own. They synchronize boot stages. multi-user.target means "all non-graphical services are ready." network-online.target means "at least one network interface has a routable address."

Other types include path units (inotify-based file watching), device units (udev integration), scope units (externally started process groups), and swap units (swap partitions).

The Dependency Graph

Systemd's core data structure is a directed graph of units.

Activation dependencies control what starts:

Requires=B means if A starts, B must also start. If B fails, A is stopped.
Wants=B means if A starts, B should also start. If B fails, A continues.
BindsTo=B is like Requires but also stops A if B stops at any time.
Conflicts=B means starting A stops B and vice versa.

Ordering dependencies control when:

After=B means A starts only after B has finished starting.
Before=B means A finishes starting before B begins.

These are independent. Requires=B without After=B means A and B start simultaneously. After=B without Requires=B means A waits for B only if B happens to be starting.

When a start request comes in, systemd's transaction engine computes a set of jobs (start, stop, reload) that satisfy all constraints. If the graph contains a cycle -- A after B, B after C, C after A -- the transaction cannot be computed. Systemd breaks the cycle by dropping the weakest edge (a Wants edge is weaker than a Requires edge) and logs: "Job for X.service deleted to break ordering cycle."

# Visualize dependencies for a specific service
systemctl list-dependencies nginx.service --all

# Export the full graph as DOT format for Graphviz
systemd-analyze dot --to-pattern='*.service' --from-pattern='*.target' | dot -Tsvg -o boot-graph.svg

# Check for cycles and errors in a unit file
systemd-analyze verify /etc/systemd/system/myapp.service

Socket Activation

This is the mechanism that makes parallel boot work.

Traditional init starts services sequentially because of implicit socket dependencies. Service A needs to connect to service B's port. If B has not started yet, A fails. So B must start first. Every such dependency serializes the boot.

Systemd's insight: create the sockets first, then start the services in parallel. If service A tries to connect to service B's socket before B is ready, the connection lands in the kernel's socket backlog. When B finally starts and calls accept(), it picks up the queued connection. No failure, no ordering constraint.

The implementation:

PID 1 reads all .socket units and creates the sockets (socket/bind/listen).
PID 1 starts all services in parallel.
When a service starts, systemd passes its socket file descriptors through the LISTEN_FDS environment variable.
The service calls sd_listen_fds() to retrieve them.

# List all active socket units and their listening addresses
systemctl list-sockets

# Check if a socket unit is active (listening)
systemctl is-active dbus.socket

# Manually trigger socket activation by connecting
systemctl start webapp.socket
curl http://localhost:8080  # This starts webapp.service

Socket activation also enables on-demand service startup. A service that receives traffic once a day does not need to run continuously. Its socket unit listens, and systemd starts the service only when a connection arrives.

Cgroup Integration

Systemd is the cgroup manager. Period.

On a systemd system, the cgroup hierarchy under /sys/fs/cgroup/ is owned by PID 1. Every service, every user session, every container scope gets its own cgroup. The hierarchy looks like this:

/sys/fs/cgroup/
  system.slice/
    nginx.service/
    postgresql.service/
    sshd.service/
  user.slice/
    user-1000.slice/
      session-1.scope/
  machine.slice/
    docker-abc123.scope/

Unit files map directly to cgroup v2 controller knobs:

Unit directive	Cgroup v2 file	Effect
MemoryMax=2G	memory.max	Hard memory limit, triggers OOM within the cgroup
MemoryHigh=1G	memory.high	Throttle point, kernel reclaims aggressively
CPUQuota=200%	cpu.max	Bandwidth limit (200% = 2 full cores)
CPUWeight=50	cpu.weight	Proportional share scheduling
TasksMax=512	pids.max	Limit number of processes/threads
IOWeight=100	io.weight	Proportional I/O scheduling

The critical consequence: when systemd stops a service, it sends SIGTERM to the main process, waits TimeoutStopSec (default 90s), then sends SIGKILL to every process in the cgroup. No child process, no matter how deeply forked, escapes. This was impossible with SysVinit's PID-file-based tracking, where a daemon's grandchild processes would be orphaned.

# View the cgroup tree for all services
systemd-cgls --no-pager

# Real-time resource usage per service (like top for cgroups)
systemd-cgtop

# Show memory and CPU usage for a specific service
systemctl show nginx.service -p MemoryCurrent,CPUUsageNSec,TasksCurrent

# Create a transient cgroup-limited scope for a one-off command
systemd-run --scope -p MemoryMax=256M -p CPUQuota=50% ./build-script.sh

The Journal

journald captures stdout, stderr, and syslog output from every service and indexes it by dozens of fields: unit name, PID, UID, GID, boot ID, transport, syslog facility, priority level, and custom fields. The binary format allows O(log n) seeks by timestamp and field-based filtering without scanning the entire log.

# Show logs for a service from the last 10 minutes
journalctl -u nginx.service --since "10 min ago"

# Show only errors and above from the previous boot
journalctl -b -1 -p err

# Correlate application and kernel logs for the same time window
journalctl -u myapp.service -k --since "03:00" --until "03:05"

# Output as JSON for log aggregation pipelines
journalctl -u myapp.service -o json --no-pager | jq '.MESSAGE'

# Check journal disk usage
journalctl --disk-usage

# Show all unique unit names that have logged errors
journalctl -p err -o json --no-pager | jq -r '._SYSTEMD_UNIT' | sort -u

Journal entries have both implicit fields (added by journald: _PID, _UID, _SYSTEMD_UNIT, _BOOT_ID, _TRANSPORT) and explicit fields (sent by the application via sd_journal_send()). Applications that use sd_journal_send() can attach structured metadata like CODE_FILE, CODE_LINE, REQUEST_ID, or any custom key-value pair.

sd-bus and D-Bus Integration

systemctl does not directly manipulate processes. It sends D-Bus method calls to PID 1. When running systemctl restart nginx, the tool connects to the system bus, calls the RestartUnit method on the org.freedesktop.systemd1.Manager interface, and PID 1 executes the restart.

sd-bus is systemd's own D-Bus client library, built as a lighter and faster replacement for the reference libdbus implementation.

# List all services registered on the system bus
busctl list

# Show the object tree for systemd's D-Bus interface
busctl tree org.freedesktop.systemd1

# Call a method directly (equivalent to systemctl start nginx)
busctl call org.freedesktop.systemd1 /org/freedesktop/systemd1 \
  org.freedesktop.systemd1.Manager StartUnit ss "nginx.service" "replace"

# Introspect available methods and properties
busctl introspect org.freedesktop.systemd1 /org/freedesktop/systemd1

Bus activation works like socket activation: when a D-Bus message arrives for a service that is not running, dbus-daemon (or sd-bus) tells systemd to start the service first.

Debugging the Dependency Cycle Problem

The problem statement: "Job for payment-gateway.service deleted to break ordering cycle."

Step-by-step diagnosis:

# Step 1: Check the unit status for clues
systemctl status payment-gateway.service

# Step 2: Run the cycle detector
systemd-analyze verify payment-gateway.service
# Output will list the cycle edges

# Step 3: Show the effective unit file including all drop-in overrides
systemctl cat payment-gateway.service

# Step 4: Check what drop-in files exist for related units
systemd-delta --type=extended

# Step 5: Dump the full dependency graph and search for the cycle
systemd-analyze dot | grep -E 'payment-gateway|dbus|NetworkManager'

# Step 6: Check journal for the exact cycle-break message
journalctl -b -u systemd --grep="ordering cycle" --no-pager

# Step 7: After identifying the offending drop-in, remove it
rm /etc/systemd/system/dbus.service.d/99-payment-gateway.conf
systemctl daemon-reload
systemctl start payment-gateway.service

The root cause pattern: a drop-in file at /etc/systemd/system/dbus.service.d/99-payment-gateway.conf contains Wants=payment-gateway.service. Since payment-gateway already has After=dbus.service (transitively via network-online.target), this creates a cycle: dbus wants payment-gateway, payment-gateway must start after dbus. Systemd breaks the cycle by deleting the weaker Wants job, and payment-gateway never starts.

Boot Performance Analysis

# Total boot time breakdown (firmware, loader, kernel, userspace)
systemd-analyze

# Services sorted by startup time
systemd-analyze blame

# Critical chain (the longest sequential dependency path)
systemd-analyze critical-chain

# Plot boot as an SVG timeline
systemd-analyze plot > boot-timeline.svg

# Find services that slow down boot the most
systemd-analyze critical-chain multi-user.target

The critical chain is the path that determines total boot time. Optimizing a service that is not on the critical chain does not improve boot time. Common offenders: NetworkManager-wait-online.service (waits for DHCP), systemd-fsck (filesystem checks), and poorly configured Type=forking services with slow startup.

Common Questions

What happens if a Requires dependency fails?

If service A has Requires=B and After=B, and B fails to start, systemd stops A as well. With Wants=B instead, A continues even if B fails. The choice between Requires and Wants determines whether the dependency is hard or soft.

How does Type=notify differ from Type=simple?

Type=simple considers the service started as soon as the ExecStart process is forked. Type=notify waits until the service sends sd_notify(0, "READY=1") via its notification socket. This allows the service to complete initialization (open database connections, load configuration, bind ports) before dependents are started. PostgreSQL, for example, uses Type=notify so that services with After=postgresql.service do not start until the database is actually accepting connections.

Can a service be in multiple slices?

No. Each service belongs to exactly one slice. The slice is determined by the Slice= directive in the unit file (default: system.slice). Services within the same slice share a common cgroup parent, and the slice's resource limits apply as an aggregate cap across all its members.

What is the difference between systemctl restart and systemctl reload?

Restart stops the service (SIGTERM, wait, SIGKILL the cgroup) and starts it again. Reload sends the signal defined in ExecReload= (usually SIGHUP) to the main process, which re-reads its configuration without stopping. Not all services support reload. Check with systemctl show -p CanReload nginx.service.

How Technologies Use This

Docker

A production host runs dockerd as a systemd service managing 60 containers. The dockerd process crashes due to an OOM condition at 3 AM, and all 60 containers become unmanageable until someone manually restarts the daemon. Configuring dockerd.service with Restart=always and RestartSec=5s tells systemd to automatically relaunch the daemon within 5 seconds of any exit, whether from a crash, OOM kill, or unexpected signal.

Systemd also manages dockerd's socket through socket activation. The unit file docker.socket creates the /var/run/docker.sock Unix socket independently of the dockerd process. When a client (docker CLI, a CI pipeline, or a monitoring agent) connects to the socket while dockerd is not running, the kernel queues the connection in the socket backlog. Systemd detects the incoming connection and starts dockerd.service, passing the already-open file descriptor via the LISTEN_FDS environment variable and the sd_listen_fds() library call. The daemon picks up the queued connection without any client-side retry logic.

The combination of Restart=always and socket activation means that a dockerd crash at 3 AM results in at most 5 seconds of unavailability. Clients connecting during the restart window have their connections queued in the kernel socket backlog (default depth 128) rather than receiving ECONNREFUSED. The systemd journal captures the crash backtrace, OOM score, and exit code via journalctl -u docker.service, providing full forensics without any external log shipping.

Kubernetes

A Kubernetes node runs kubelet as a systemd service managing 110 pods. The kubelet needs direct control over cgroup hierarchies to enforce per-pod CPU and memory limits, but systemd also manages cgroups for every service it supervises. Without explicit delegation, systemd and kubelet fight over the same cgroup tree, causing pods to lose their resource limits or kubelet to fail cgroup operations with EPERM.

Setting Delegate=yes in the kubelet.service unit file tells systemd to grant kubelet full ownership of its cgroup subtree. Systemd creates the kubelet's cgroup (typically system.slice/kubelet.service), then steps back and allows kubelet to create child cgroups, write to controllers (cpu.max, memory.max, pids.max), and manage the entire hierarchy below its own cgroup. Without Delegate=yes, systemd periodically resets the cgroup configuration to match its own unit file parameters, undoing the limits kubelet set for individual pods.

The kubelet.service unit also typically includes CPUAccounting=yes, MemoryAccounting=yes, and Slice=system.slice to ensure that kubelet itself is tracked within the cgroup hierarchy. Running `systemctl show kubelet.service -p Delegate,CPUAccounting,MemoryAccounting` confirms the delegation is active. On nodes where Delegate=yes is missing, symptoms include pods ignoring their memory limits (because systemd overwrites memory.max) and kubelet log entries showing "failed to set cgroup config" errors during pod creation.

systemd

A server running 12 systemd services fails to boot after a package update. Running `systemctl status` shows payment-gateway.service as "failed" with the message "Job deleted to break ordering cycle." The dependency graph has a circular chain: payment-gateway requires network-online.target, which pulls in NetworkManager-wait-online.service, which depends on dbus.service, which has a Wants= on payment-gateway.service added by a misconfigured drop-in file.

Running `systemd-analyze verify payment-gateway.service` detects and reports the dependency cycle before the next boot. The verify subcommand loads all referenced unit files, builds the dependency graph, and reports any cycles, missing units, or invalid directives. It also catches typos in After=, Requires=, and Wants= lines that reference nonexistent units. On a system with 200+ unit files, verify can check a single service and its transitive dependencies in under 2 seconds.

Once the cycle is identified, `systemd-analyze dot payment-gateway.service | dot -Tsvg > deps.svg` renders the full dependency graph as an SVG image, making the circular path visually obvious. The fix is to remove the spurious Wants=payment-gateway.service from the dbus.service drop-in directory (/etc/systemd/system/dbus.service.d/). Running `systemd-analyze verify` again after the fix confirms the cycle is resolved. Integrating this verify step into the deployment pipeline catches dependency cycles before they reach production and prevent services from starting.

Same Concept Across Tech

Technology	How it uses systemd	Key gotcha
Docker/containerd	Delegates cgroup management to systemd (--cgroup-driver=systemd). Each container gets a scope unit under system.slice	Mismatch between container runtime cgroup driver and kubelet cgroup driver causes resource accounting errors
Kubernetes (kubelet)	Runs as a systemd service. Uses systemd cgroup driver to create pod-level cgroups. Relies on systemd for node-level resource isolation	kubelet must match the container runtime's cgroup driver setting. Mixed cgroupfs/systemd drivers break resource limits
cloud-init	Runs as four ordered systemd services (local, network, config, final). Uses After= and Wants= for boot ordering	Application services that start before cloud-final.target may run before user-data scripts finish configuring the environment
PostgreSQL	Ships with a systemd service unit. Uses Type=notify to signal readiness via sd_notify(). Relies on cgroup isolation for shared hosting	Must use After=network-online.target if the database listens on network interfaces. Type=forking is wrong for modern PostgreSQL
NGINX	Uses Type=forking with PIDFile= for the master process. Socket activation possible via nginx.socket for zero-downtime upgrades	Reload (systemctl reload) sends SIGHUP for graceful config reload. Restart kills workers and drops in-flight connections

Stack layer mapping (service fails to start after dependency cycle):

Layer	What to check	Tool
Unit configuration	Which drop-in files added unexpected dependencies?	systemctl cat, systemd-delta
Dependency graph	Where is the cycle in the graph?	systemd-analyze verify, systemd-analyze dot
Transaction engine	Which job was deleted to break the cycle?	journalctl -b -u systemd --grep="ordering cycle"
Generator output	Did a generator (e.g., systemd-fstab-generator) add implicit dependencies?	ls /run/systemd/generator/
Boot timeline	What else was affected by the broken dependency?	systemd-analyze critical-chain

Design Rationale SysVinit started services sequentially via numbered shell scripts in /etc/rc.d/. Boot took minutes because every service waited for the previous one. Upstart introduced event-based parallelism but still relied on ad-hoc scripts. Systemd took a different approach: declare dependencies explicitly, let the init system build a graph, start everything in parallel, and use socket activation to eliminate false serialization. Service A does not need to wait for service B to be ready; it just needs B's socket to exist. The socket can exist before B's process even starts. This insight -- that most service dependencies are really socket dependencies -- is what makes systemd boots fast. The cgroup integration ensures no process escapes tracking, the journal ensures no log line is lost, and the declarative unit files ensure the configuration is parseable by tools, not just by bash.

If You See This, Think This

Symptom	Likely cause	First check
"Job deleted to break ordering cycle"	Circular dependency in the unit graph	systemd-analyze verify to identify the cycle edges
Service starts before database is ready	Missing After= directive (Requires= alone does not order)	systemctl cat to check for After= alongside Requires=
Service marked as failed but process is still running	Type=simple but the binary forks/daemonizes	Change to Type=forking with PIDFile= or remove daemonization
Orphaned processes survive service restart	Service spawning children outside the main cgroup (rare with systemd)	systemd-cgls to check cgroup membership
Boot takes 90+ seconds	A service blocking the critical chain (often network-online.target)	systemd-analyze critical-chain and systemd-analyze blame
Journal consuming excessive disk space	No size limits configured in journald.conf	journalctl --disk-usage; set SystemMaxUse= in /etc/systemd/journald.conf
Socket activation not working	Socket unit and service unit names do not match	Ensure webapp.socket activates webapp.service (or set Service= explicitly)
Service restarts in a loop	Restart=always with a crash bug; StartLimitBurst reached	journalctl -u service to see crash reason; check StartLimitIntervalSec and StartLimitBurst

When to Use / Avoid

Relevant when:

Debugging service startup failures, dependency ordering, or boot performance
Configuring resource limits (CPU, memory, I/O) per service without containers
Setting up socket-activated services for on-demand startup or zero-downtime restarts
Replacing cron jobs with systemd timers for better logging and dependency management
Investigating why a process survived a service stop (cgroup tracking)

Watch out for:

Dependency cycles from drop-in files or generator output that add unexpected edges to the graph
Type=simple with forking daemons causes systemd to lose track of the main PID
Journal disk usage growing unbounded without SystemMaxUse= in journald.conf
Services without MemoryMax= can consume all host memory before systemd-oomd intervenes

Try It Yourself

 1  # Show the full boot timeline sorted by startup duration
 2  
 3  systemd-analyze blame
 4  
 5  # Display the critical chain (longest sequential path during boot)
 6  
 7  systemd-analyze critical-chain
 8  
 9  # Verify a unit file for syntax errors and dependency cycles
10  
11  systemd-analyze verify /etc/systemd/system/myservice.service
12  
13  # List all dependencies of a service, including transitive ones
14  
15  systemctl list-dependencies nginx.service --all
16  
17  # Show effective unit file including all drop-in overrides
18  
19  systemctl cat nginx.service
20  
21  # Export the dependency graph as an SVG image
22  
23  systemd-analyze dot --to-pattern='*.service' | dot -Tsvg -o services.svg
24  
25  # Check resource usage of a running service
26  
27  systemctl show nginx.service -p MemoryCurrent,CPUUsageNSec,TasksCurrent
28  
29  # Follow live journal output for a specific service
30  
31  journalctl -u nginx.service -f
32  
33  # Show logs from the previous boot for a crashed service
34  
35  journalctl -b -1 -u myservice.service -p err
36  
37  # List all active timers and their next trigger time
38  
39  systemctl list-timers --all
40  
41  # View the cgroup hierarchy for all services
42  
43  systemd-cgls --no-pager
44  
45  # Show all units that changed from the vendor defaults
46  
47  systemd-delta
48  
49  # Inspect D-Bus methods exposed by systemd
50  
51  busctl introspect org.freedesktop.systemd1 /org/freedesktop/systemd1
52  
53  # Check for masked units that silently prevent startup
54  
55  systemctl list-unit-files --state=masked
56  
57  # Create a transient service for a one-off command with resource limits
58  
59  systemd-run --scope -p MemoryMax=512M -p CPUQuota=50% ./heavy-task

Debug Checklist

1Check unit status and recent logs: systemctl status myservice.service
2Show full dependency tree: systemctl list-dependencies myservice.service --all
3Detect dependency cycles: systemd-analyze verify myservice.service
4Visualize boot critical path: systemd-analyze critical-chain myservice.service
5Export dependency graph: systemd-analyze dot | dot -Tsvg -o deps.svg
6List all failed units: systemctl --failed
7Show effective unit file with overrides: systemctl cat myservice.service
8Check cgroup resource usage: systemctl show myservice.service -p MemoryCurrent,CPUUsageNSec,TasksCurrent
9View logs for a specific boot: journalctl -b -1 -u myservice.service
10Check for masked or disabled units: systemctl list-unit-files --state=masked,disabled

Key Takeaways

✓Systemd parallelizes boot by starting all units simultaneously and letting socket dependencies serialize only what must be serialized. Service A does not wait for service B to finish starting; it waits for B's socket to exist. If B's socket is activated, A can start before B's process even launches.
✓The dependency graph has two separate concepts that are often confused. Requires/Wants define what gets pulled in (activation dependencies). After/Before define startup ordering. They are independent. Requires=B without After=B means A and B start in parallel. Most configurations need both.
✓Every service runs in its own cgroup. This is not optional. When systemctl stop is called, systemd sends SIGTERM to the main process, waits TimeoutStopSec, then sends SIGKILL to every process in the cgroup. No orphaned child process survives a service stop, unlike SysVinit where daemonized children could escape the PID file tracking.
✓Socket activation decouples socket lifetime from service lifetime. The .socket unit creates and binds the listening socket. The .service unit inherits the file descriptor. Between service restarts, the kernel keeps the socket open and buffers connections. This is why systemd can restart dbus.service without breaking every D-Bus client.
✓The journal is append-only and structured. Each entry has implicit fields (_PID, _UID, _SYSTEMD_UNIT, _BOOT_ID) added by journald, plus explicit fields from the application. Binary format enables O(log n) seeks by timestamp, unlike grep on text log files.
✓Target units replaced runlevels but are more flexible. A target can depend on other targets, creating a tree. emergency.target pulls in almost nothing. multi-user.target pulls in networking, logging, cron, and all enabled services. graphical.target adds the display manager on top of multi-user.target.

Common Pitfalls

✗Confusing Requires= with After=. Writing Requires=database.service without After=database.service means both services start in parallel. The application crashes because the database is not ready yet. The fix is to add both directives, or use socket activation so the application connects to the database socket, which exists before the database process is fully initialized.
✗Creating dependency cycles with drop-in files. A drop-in in /etc/systemd/system/dbus.service.d/ that adds Wants=myapp.service, combined with myapp.service having After=dbus.service, creates a cycle. Systemd silently breaks the cycle and deletes the weakest job. The service fails to start with a cryptic "deleted to break ordering cycle" message. Use systemd-analyze verify to catch cycles before deploying.
✗Using Type=simple for a service that forks. If the service binary daemonizes itself (double fork), systemd considers the main PID to have exited and marks the service as failed. Use Type=forking with PIDFile= for legacy daemons, or better, remove the daemonization code and let systemd manage the process lifecycle directly.
✗Not setting resource limits and then wondering why a misbehaving service caused an OOM kill on unrelated processes. Without MemoryMax=, a service can consume all available memory. Systemd's default cgroup placement helps systemd-oomd identify the culprit, but explicit limits prevent the problem entirely.
✗Ignoring the journal and relying solely on application log files. When a service crashes before writing to its log file, the journal still captures its stderr, the exit code, the signal that killed it, and any kernel messages from the same moment. Running journalctl -u myservice -p err --since "1 hour ago" surfaces failures that never made it to application logs.

Reference

System Calls

cloneexecvesocketepoll_ctlinotify_add_watchmount

Tools

systemctljournalctlsystemd-analyzesystemd-cglssystemd-cgtopbusctl

📌

In One Line

Systemd boots Linux by building a dependency graph of units, starting everything in parallel where sockets allow, isolating each service in its own cgroup, and logging every event in a structured journal that makes grep pipelines obsolete.

Systemd Internals

DockerKubernetessystemd

🧠

Mental Model

💡

The Problem

Architecture

Unit Types and What They Do

Everything systemd manages is a unit. A unit has a type, a name, and a configuration file.

Other types include path units (inotify-based file watching), device units (udev integration), scope units (externally started process groups), and swap units (swap partitions).

The Dependency Graph

Systemd's core data structure is a directed graph of units.

Activation dependencies control what starts:

Requires=B means if A starts, B must also start. If B fails, A is stopped.
Wants=B means if A starts, B should also start. If B fails, A continues.
BindsTo=B is like Requires but also stops A if B stops at any time.
Conflicts=B means starting A stops B and vice versa.

Ordering dependencies control when:

After=B means A starts only after B has finished starting.
Before=B means A finishes starting before B begins.

These are independent. Requires=B without After=B means A and B start simultaneously. After=B without Requires=B means A waits for B only if B happens to be starting.

# Visualize dependencies for a specific service
systemctl list-dependencies nginx.service --all

# Export the full graph as DOT format for Graphviz
systemd-analyze dot --to-pattern='*.service' --from-pattern='*.target' | dot -Tsvg -o boot-graph.svg

# Check for cycles and errors in a unit file
systemd-analyze verify /etc/systemd/system/myapp.service

Socket Activation

This is the mechanism that makes parallel boot work.

The implementation:

PID 1 reads all .socket units and creates the sockets (socket/bind/listen).
PID 1 starts all services in parallel.
When a service starts, systemd passes its socket file descriptors through the LISTEN_FDS environment variable.
The service calls sd_listen_fds() to retrieve them.

# List all active socket units and their listening addresses
systemctl list-sockets

# Check if a socket unit is active (listening)
systemctl is-active dbus.socket

# Manually trigger socket activation by connecting
systemctl start webapp.socket
curl http://localhost:8080  # This starts webapp.service

Cgroup Integration

Systemd is the cgroup manager. Period.

On a systemd system, the cgroup hierarchy under /sys/fs/cgroup/ is owned by PID 1. Every service, every user session, every container scope gets its own cgroup. The hierarchy looks like this:

/sys/fs/cgroup/
  system.slice/
    nginx.service/
    postgresql.service/
    sshd.service/
  user.slice/
    user-1000.slice/
      session-1.scope/
  machine.slice/
    docker-abc123.scope/

Unit files map directly to cgroup v2 controller knobs:

Unit directive	Cgroup v2 file	Effect
MemoryMax=2G	memory.max	Hard memory limit, triggers OOM within the cgroup
MemoryHigh=1G	memory.high	Throttle point, kernel reclaims aggressively
CPUQuota=200%	cpu.max	Bandwidth limit (200% = 2 full cores)
CPUWeight=50	cpu.weight	Proportional share scheduling
TasksMax=512	pids.max	Limit number of processes/threads
IOWeight=100	io.weight	Proportional I/O scheduling

# View the cgroup tree for all services
systemd-cgls --no-pager

# Real-time resource usage per service (like top for cgroups)
systemd-cgtop

# Show memory and CPU usage for a specific service
systemctl show nginx.service -p MemoryCurrent,CPUUsageNSec,TasksCurrent

# Create a transient cgroup-limited scope for a one-off command
systemd-run --scope -p MemoryMax=256M -p CPUQuota=50% ./build-script.sh

The Journal

# Show logs for a service from the last 10 minutes
journalctl -u nginx.service --since "10 min ago"

# Show only errors and above from the previous boot
journalctl -b -1 -p err

# Correlate application and kernel logs for the same time window
journalctl -u myapp.service -k --since "03:00" --until "03:05"

# Output as JSON for log aggregation pipelines
journalctl -u myapp.service -o json --no-pager | jq '.MESSAGE'

# Check journal disk usage
journalctl --disk-usage

# Show all unique unit names that have logged errors
journalctl -p err -o json --no-pager | jq -r '._SYSTEMD_UNIT' | sort -u

sd-bus and D-Bus Integration

sd-bus is systemd's own D-Bus client library, built as a lighter and faster replacement for the reference libdbus implementation.

# List all services registered on the system bus
busctl list

# Show the object tree for systemd's D-Bus interface
busctl tree org.freedesktop.systemd1

# Call a method directly (equivalent to systemctl start nginx)
busctl call org.freedesktop.systemd1 /org/freedesktop/systemd1 \
  org.freedesktop.systemd1.Manager StartUnit ss "nginx.service" "replace"

# Introspect available methods and properties
busctl introspect org.freedesktop.systemd1 /org/freedesktop/systemd1

Bus activation works like socket activation: when a D-Bus message arrives for a service that is not running, dbus-daemon (or sd-bus) tells systemd to start the service first.

Debugging the Dependency Cycle Problem

The problem statement: "Job for payment-gateway.service deleted to break ordering cycle."

Step-by-step diagnosis:

# Step 1: Check the unit status for clues
systemctl status payment-gateway.service

# Step 2: Run the cycle detector
systemd-analyze verify payment-gateway.service
# Output will list the cycle edges

# Step 3: Show the effective unit file including all drop-in overrides
systemctl cat payment-gateway.service

# Step 4: Check what drop-in files exist for related units
systemd-delta --type=extended

# Step 5: Dump the full dependency graph and search for the cycle
systemd-analyze dot | grep -E 'payment-gateway|dbus|NetworkManager'

# Step 6: Check journal for the exact cycle-break message
journalctl -b -u systemd --grep="ordering cycle" --no-pager

# Step 7: After identifying the offending drop-in, remove it
rm /etc/systemd/system/dbus.service.d/99-payment-gateway.conf
systemctl daemon-reload
systemctl start payment-gateway.service

Boot Performance Analysis

# Total boot time breakdown (firmware, loader, kernel, userspace)
systemd-analyze

# Services sorted by startup time
systemd-analyze blame

# Critical chain (the longest sequential dependency path)
systemd-analyze critical-chain

# Plot boot as an SVG timeline
systemd-analyze plot > boot-timeline.svg

# Find services that slow down boot the most
systemd-analyze critical-chain multi-user.target

Common Questions

What happens if a Requires dependency fails?

How does Type=notify differ from Type=simple?

Can a service be in multiple slices?

What is the difference between systemctl restart and systemctl reload?

How Technologies Use This

Docker

Kubernetes

systemd

Same Concept Across Tech

Technology	How it uses systemd	Key gotcha
Docker/containerd	Delegates cgroup management to systemd (--cgroup-driver=systemd). Each container gets a scope unit under system.slice	Mismatch between container runtime cgroup driver and kubelet cgroup driver causes resource accounting errors
Kubernetes (kubelet)	Runs as a systemd service. Uses systemd cgroup driver to create pod-level cgroups. Relies on systemd for node-level resource isolation	kubelet must match the container runtime's cgroup driver setting. Mixed cgroupfs/systemd drivers break resource limits
cloud-init	Runs as four ordered systemd services (local, network, config, final). Uses After= and Wants= for boot ordering	Application services that start before cloud-final.target may run before user-data scripts finish configuring the environment
PostgreSQL	Ships with a systemd service unit. Uses Type=notify to signal readiness via sd_notify(). Relies on cgroup isolation for shared hosting	Must use After=network-online.target if the database listens on network interfaces. Type=forking is wrong for modern PostgreSQL
NGINX	Uses Type=forking with PIDFile= for the master process. Socket activation possible via nginx.socket for zero-downtime upgrades	Reload (systemctl reload) sends SIGHUP for graceful config reload. Restart kills workers and drops in-flight connections

Stack layer mapping (service fails to start after dependency cycle):

Layer	What to check	Tool
Unit configuration	Which drop-in files added unexpected dependencies?	systemctl cat, systemd-delta
Dependency graph	Where is the cycle in the graph?	systemd-analyze verify, systemd-analyze dot
Transaction engine	Which job was deleted to break the cycle?	journalctl -b -u systemd --grep="ordering cycle"
Generator output	Did a generator (e.g., systemd-fstab-generator) add implicit dependencies?	ls /run/systemd/generator/
Boot timeline	What else was affected by the broken dependency?	systemd-analyze critical-chain

If You See This, Think This

Symptom	Likely cause	First check
"Job deleted to break ordering cycle"	Circular dependency in the unit graph	systemd-analyze verify to identify the cycle edges
Service starts before database is ready	Missing After= directive (Requires= alone does not order)	systemctl cat to check for After= alongside Requires=
Service marked as failed but process is still running	Type=simple but the binary forks/daemonizes	Change to Type=forking with PIDFile= or remove daemonization
Orphaned processes survive service restart	Service spawning children outside the main cgroup (rare with systemd)	systemd-cgls to check cgroup membership
Boot takes 90+ seconds	A service blocking the critical chain (often network-online.target)	systemd-analyze critical-chain and systemd-analyze blame
Journal consuming excessive disk space	No size limits configured in journald.conf	journalctl --disk-usage; set SystemMaxUse= in /etc/systemd/journald.conf
Socket activation not working	Socket unit and service unit names do not match	Ensure webapp.socket activates webapp.service (or set Service= explicitly)
Service restarts in a loop	Restart=always with a crash bug; StartLimitBurst reached	journalctl -u service to see crash reason; check StartLimitIntervalSec and StartLimitBurst

When to Use / Avoid

Relevant when:

Debugging service startup failures, dependency ordering, or boot performance
Configuring resource limits (CPU, memory, I/O) per service without containers
Setting up socket-activated services for on-demand startup or zero-downtime restarts
Replacing cron jobs with systemd timers for better logging and dependency management
Investigating why a process survived a service stop (cgroup tracking)

Watch out for:

Dependency cycles from drop-in files or generator output that add unexpected edges to the graph
Type=simple with forking daemons causes systemd to lose track of the main PID
Journal disk usage growing unbounded without SystemMaxUse= in journald.conf
Services without MemoryMax= can consume all host memory before systemd-oomd intervenes

Try It Yourself

 1  # Show the full boot timeline sorted by startup duration
 2  
 3  systemd-analyze blame
 4  
 5  # Display the critical chain (longest sequential path during boot)
 6  
 7  systemd-analyze critical-chain
 8  
 9  # Verify a unit file for syntax errors and dependency cycles
10  
11  systemd-analyze verify /etc/systemd/system/myservice.service
12  
13  # List all dependencies of a service, including transitive ones
14  
15  systemctl list-dependencies nginx.service --all
16  
17  # Show effective unit file including all drop-in overrides
18  
19  systemctl cat nginx.service
20  
21  # Export the dependency graph as an SVG image
22  
23  systemd-analyze dot --to-pattern='*.service' | dot -Tsvg -o services.svg
24  
25  # Check resource usage of a running service
26  
27  systemctl show nginx.service -p MemoryCurrent,CPUUsageNSec,TasksCurrent
28  
29  # Follow live journal output for a specific service
30  
31  journalctl -u nginx.service -f
32  
33  # Show logs from the previous boot for a crashed service
34  
35  journalctl -b -1 -u myservice.service -p err
36  
37  # List all active timers and their next trigger time
38  
39  systemctl list-timers --all
40  
41  # View the cgroup hierarchy for all services
42  
43  systemd-cgls --no-pager
44  
45  # Show all units that changed from the vendor defaults
46  
47  systemd-delta
48  
49  # Inspect D-Bus methods exposed by systemd
50  
51  busctl introspect org.freedesktop.systemd1 /org/freedesktop/systemd1
52  
53  # Check for masked units that silently prevent startup
54  
55  systemctl list-unit-files --state=masked
56  
57  # Create a transient service for a one-off command with resource limits
58  
59  systemd-run --scope -p MemoryMax=512M -p CPUQuota=50% ./heavy-task

Debug Checklist

1Check unit status and recent logs: systemctl status myservice.service
2Show full dependency tree: systemctl list-dependencies myservice.service --all
3Detect dependency cycles: systemd-analyze verify myservice.service
4Visualize boot critical path: systemd-analyze critical-chain myservice.service
5Export dependency graph: systemd-analyze dot | dot -Tsvg -o deps.svg
6List all failed units: systemctl --failed
7Show effective unit file with overrides: systemctl cat myservice.service
8Check cgroup resource usage: systemctl show myservice.service -p MemoryCurrent,CPUUsageNSec,TasksCurrent
9View logs for a specific boot: journalctl -b -1 -u myservice.service
10Check for masked or disabled units: systemctl list-unit-files --state=masked,disabled

Key Takeaways

✓Systemd parallelizes boot by starting all units simultaneously and letting socket dependencies serialize only what must be serialized. Service A does not wait for service B to finish starting; it waits for B's socket to exist. If B's socket is activated, A can start before B's process even launches.
✓The dependency graph has two separate concepts that are often confused. Requires/Wants define what gets pulled in (activation dependencies). After/Before define startup ordering. They are independent. Requires=B without After=B means A and B start in parallel. Most configurations need both.
✓Every service runs in its own cgroup. This is not optional. When systemctl stop is called, systemd sends SIGTERM to the main process, waits TimeoutStopSec, then sends SIGKILL to every process in the cgroup. No orphaned child process survives a service stop, unlike SysVinit where daemonized children could escape the PID file tracking.
✓Socket activation decouples socket lifetime from service lifetime. The .socket unit creates and binds the listening socket. The .service unit inherits the file descriptor. Between service restarts, the kernel keeps the socket open and buffers connections. This is why systemd can restart dbus.service without breaking every D-Bus client.
✓The journal is append-only and structured. Each entry has implicit fields (_PID, _UID, _SYSTEMD_UNIT, _BOOT_ID) added by journald, plus explicit fields from the application. Binary format enables O(log n) seeks by timestamp, unlike grep on text log files.
✓Target units replaced runlevels but are more flexible. A target can depend on other targets, creating a tree. emergency.target pulls in almost nothing. multi-user.target pulls in networking, logging, cron, and all enabled services. graphical.target adds the display manager on top of multi-user.target.

Common Pitfalls

✗Confusing Requires= with After=. Writing Requires=database.service without After=database.service means both services start in parallel. The application crashes because the database is not ready yet. The fix is to add both directives, or use socket activation so the application connects to the database socket, which exists before the database process is fully initialized.
✗Creating dependency cycles with drop-in files. A drop-in in /etc/systemd/system/dbus.service.d/ that adds Wants=myapp.service, combined with myapp.service having After=dbus.service, creates a cycle. Systemd silently breaks the cycle and deletes the weakest job. The service fails to start with a cryptic "deleted to break ordering cycle" message. Use systemd-analyze verify to catch cycles before deploying.
✗Using Type=simple for a service that forks. If the service binary daemonizes itself (double fork), systemd considers the main PID to have exited and marks the service as failed. Use Type=forking with PIDFile= for legacy daemons, or better, remove the daemonization code and let systemd manage the process lifecycle directly.
✗Not setting resource limits and then wondering why a misbehaving service caused an OOM kill on unrelated processes. Without MemoryMax=, a service can consume all available memory. Systemd's default cgroup placement helps systemd-oomd identify the culprit, but explicit limits prevent the problem entirely.
✗Ignoring the journal and relying solely on application log files. When a service crashes before writing to its log file, the journal still captures its stderr, the exit code, the signal that killed it, and any kernel messages from the same moment. Running journalctl -u myservice -p err --since "1 hour ago" surfaces failures that never made it to application logs.

Reference

System Calls

cloneexecvesocketepoll_ctlinotify_add_watchmount

Tools

systemctljournalctlsystemd-analyzesystemd-cglssystemd-cgtopbusctl

📌

Mental Model

The Problem

Architecture

Unit Types and What They Do

The Dependency Graph

Socket Activation

Cgroup Integration

The Journal

sd-bus and D-Bus Integration

Debugging the Dependency Cycle Problem

Boot Performance Analysis

Common Questions

How Technologies Use This

Same Concept Across Tech

If You See This, Think This

When to Use / Avoid

Try It Yourself

Debug Checklist

Key Takeaways

Common Pitfalls

Reference

In One Line

Related Topics

Mental Model

The Problem

Architecture

Unit Types and What They Do

The Dependency Graph

Socket Activation

Cgroup Integration

The Journal

sd-bus and D-Bus Integration

Debugging the Dependency Cycle Problem

Boot Performance Analysis

Common Questions

How Technologies Use This

Same Concept Across Tech

If You See This, Think This

When to Use / Avoid

Try It Yourself

Debug Checklist

Key Takeaways

Common Pitfalls

Reference

In One Line

Related Topics