Containers & SecurityTopic 2 of 9

Security & Access ControlIntermediate

Audit Framework & Logging

systemdKubernetes

🧠

Mental Model

A building where every room has a ceiling-mounted camera run by management, not the tenants. Tenants keep their own logbooks -- skip entries, rip out pages, write whatever they want. The cameras do not care. They record automatically, and the footage goes to a locked room tenants cannot touch. At the front door, each person gets a wristband with their real name. Borrow someone else's badge inside, swap hats, change coats -- the camera still reads the wristband. Management can even make the system tamper-proof: once locked, not even the security guards can turn it off without rebooting the whole building.

💡

The Problem

Someone drops a production database via sudo at 3 AM. Three operators had root access that night, and every record shows uid=0 -- no way to tell who actually did it. The application logs are blank because the command ran directly on the host, and anyone with root can edit log files at will. On another node, a container quietly calls setns() to slip into the host PID namespace; nothing is logged because the syscall fires below anything application-level monitoring can see. Then a PCI DSS Requirement 10 audit lands, asking for proof of exactly who accessed cardholder data last month across 200 servers. Without kernel-level tracking, the honest answer is "unknown."

Architecture

It is 3 AM. Someone runs rm -rf /var/lib/etcd/ on a production Kubernetes node. By morning, the cluster is broken and nobody is talking.

The sysadmin who triaged the incident checks /var/log/audit/audit.log. There it is: type=EXECVE, the full command line, the process ID, the effective UID (0, because sudo), and -- critically -- auid=1003. That is the audit UID. It does not change through sudo. It does not change through su. It was stamped on this session when the human logged in, and it followed every privilege transition silently.

auid=1003 is Alice. She ran the command. The audit log proves it. No application-level log could have captured this with the same certainty.

What Actually Happens

The audit framework has two halves: a kernel component and a userspace daemon.

In the kernel, kauditd is a kernel thread that sits at the intersection of syscall entry/exit points, LSM hooks (SELinux/AppArmor), filesystem watch triggers, and PAM authentication events. When a security-relevant event occurs, the kernel evaluates it against configured audit rules. If a rule matches, the kernel's audit subsystem builds an event record.

That record is thorough. It includes the syscall number, all arguments, return value, process credentials (UID, GID, auid, session ID), file paths, current working directory, and SELinux/AppArmor context. The record is queued in a kernel backlog buffer.

In userspace, auditd receives events from kauditd via a NETLINK_AUDIT socket and writes them to /var/log/audit/audit.log. It handles log rotation, can forward events to remote aggregators via the audisp-remote plugin, and can be configured to halt the system rather than lose audit data.

Each syscall generates multiple related records sharing the same event ID: SYSCALL (core data), CWD (working directory), PATH (each file involved), PROCTITLE (full command line), and optionally EXECVE (program arguments) and SOCKADDR (network addresses). ausearch correlates these by event ID. aureport summarizes by category.

Under the Hood

Audit rules come in two flavors.

File system watches monitor specific files or directories: -w /etc/shadow -p wa -k shadow_watch. The -p wa means trigger on write (w) or attribute change (a). The -k shadow_watch tags the event for easy searching later. These use inotify-like kernel hooks.

System call rules match syscalls with filter conditions: -a always,exit -F arch=b64 -S execve -F auid>=1000 -k user_commands. This captures every command execution by a real user (auid >= 1000 excludes system accounts).

The auid field is what makes the audit framework uniquely powerful. It is set once by PAM at login (written to /proc/self/loginuid) and never changes for the life of the session. User alice (UID 1000) runs sudo su -. The shell is now UID 0. She runs su bob. Now UID 1001. Through every transition, the auid stays 1000. The audit record shows uid=1001 auid=1000 -- alice did it.

Immutable rules add another layer of protection. After configuring rules, adding -e 2 to the rules file makes the audit configuration immutable until reboot. An attacker who gains root cannot disable logging. Combined with remote log forwarding, this creates a tamper-resistant audit trail.

The backlog is a fixed-size kernel buffer (default 8192 events). If auditd cannot consume events fast enough, the backlog fills. What happens next depends on configuration: failure=0 silently drops events, failure=1 prints kernel warnings, failure=2 panics the kernel. That last option exists because in some environments, losing audit data is worse than downtime. Setting audit=1 on the kernel command line enables auditing before init runs, capturing early boot events.

Common Questions

How does audit handle log integrity?

The log file itself is plain text. For tamper evidence, forward events in real-time to a remote SIEM (Splunk, ELK, Wazuh) via the audisp-remote plugin. auditd can be configured with disk_full_action=halt (stop the system rather than lose data) and admin_space_left_action=single (switch to single-user mode). The -e 2 immutable mode prevents an attacker from disabling rules even with root access.

How does auid differ from uid in audit records?

The uid field shows the effective UID at the time of the syscall -- it changes with sudo, su, and setuid. The auid is set once at login and never changes. If alice (UID 1000) runs sudo su -, the shell has uid=0 auid=1000. Every action traces back to alice regardless of privilege transitions. ausearch -ua 1000 finds everything she did.

What rules do compliance frameworks require?

PCI DSS Requirement 10 needs: login/logout events, privilege escalation (-a always,exit -S execve -C uid!=euid -k priv_esc), access to cardholder data (-w /path/to/carddata -p rwxa -k pci_data), config changes (-w /etc/ -p wa -k config_change), and all admin actions. SOC 2 and HIPAA have similar requirements. The CIS Benchmarks for Linux provide curated rule sets for each framework.

How are audit logs analyzed during incident response?

Start with ausearch -ts <timestamp> -te <end_time> to bound the incident window. Use -m EXECVE -i to see executed commands in human-readable form. -k <key> narrows to specific rule categories. -ua <auid> filters by the original human. aureport -x --summary gives a high-level view. For deep analysis, export with ausearch --format csv for processing in pandas or a SIEM.

How Technologies Use This

systemd

A production database was dropped via sudo at 3 AM and three operators had root access that night. The application logs show nothing because the command was run directly on the host. The audit log shows uid=0 for every sudo action, making attribution impossible.

The problem is that sudo changes the effective UID to 0, and without kernel-level identity tracking, every action taken as root looks identical. Application logs can be tampered with by anyone who has root, and traditional Unix logging has no concept of the original human behind a privilege escalation chain.

pam_loginuid.so sets an immutable audit UID at login that persists through sudo, su, and setuid transitions. When Alice (uid 1003) runs sudo su - followed by destructive commands, every audit record carries auid=1003 regardless of effective uid. systemd-journald indexes these events for fast querying via journalctl _AUDIT_LOGINUID=1003, producing a complete timeline of one human's actions in under 500ms across millions of log entries.

Kubernetes

A container attempts to escape by calling setns() on a host namespace or loading a kernel module via finit_module(). These escape attempts generate zero logs because they happen at the syscall layer below application logging, and Kubernetes has no visibility into raw syscall activity.

The gap is that container-level monitoring tools operate above the syscall layer and cannot observe kernel-level escape techniques. A setns() call to join the host PID namespace or a finit_module() call to load a malicious kernel module bypasses all application-level logging. Without kernel-level audit rules, these actions are completely invisible.

Audit rules watching /proc/*/ns/* for namespace manipulation and auditing execve with -F auid>=1000 catch these actions with roughly 5-10us of overhead per audited syscall. Falco consumes the audit event stream via the NETLINK_AUDIT socket to generate real-time alerts, processing up to 50,000 events per second. This catches container escape attempts within milliseconds, before the attacker can pivot to the host.

Same Concept Across Tech

Concept	Docker	JVM	Node.js	Go	K8s
Syscall auditing	Host audit rules apply to all container syscalls	JNI calls generate audit events for native syscalls	N/A (Node makes syscalls, host audit captures them)	N/A (Go makes syscalls, host audit captures them)	Falco consumes audit events via NETLINK_AUDIT for pod monitoring
Identity tracking	auid persists into containers if PAM session set up	auid inherited from launching shell	auid inherited from launching shell	auid inherited from launching shell	Pod serviceAccountName maps to K8s audit; OS auid for node-level
Log integrity	Containers cannot access host /var/log/audit/	N/A (auditd is OS-level)	N/A (auditd is OS-level)	N/A (auditd is OS-level)	Forward audit logs to external SIEM (Splunk, ELK) from node
Escape detection	Audit setns(), mount(), ptrace() syscalls	N/A	N/A	N/A	Falco rules alert on namespace manipulation and module loading

Stack Layer Mapping

Layer	Component
Kernel hooks	syscall entry/exit, LSM hooks, filesystem watch triggers
kauditd	Kernel thread: evaluates rules, builds records, queues in backlog
NETLINK_AUDIT	Kernel-to-userspace transport for audit events
auditd	Writes /var/log/audit/audit.log, log rotation, remote forwarding
Analysis tools	ausearch (query), aureport (summarize), auditctl (configure)
SIEM	Splunk, ELK, Wazuh, Falco for alerting and long-term retention

Design Rationale: Application-level logging falls apart the moment a process with write access decides to edit the logs -- so audit lives in the kernel, out of reach. The auid exists to solve one specific problem: tracing actions back to a real human even after a chain of sudo and su transitions scrambles the effective UID. Immutable mode (-e 2) goes a step further, because in regulated environments losing audit data is considered worse than downtime.

If You See This, Think This

Symptom	Likely Cause	First Check
All sudo actions show uid=0, no way to attribute	auid not set -- PAM not configured with pam_loginuid.so	`cat /proc/self/loginuid` -- 4294967295 means unset
Audit log fills disk in hours	Overly broad rules auditing too many syscalls	`auditctl -l` -- remove -S all rules; use targeted syscalls
Syscalls blocked or system hangs under load	Kernel backlog overflow with failure=2 (panic)	`auditctl -s` check backlog vs backlog_limit
ausearch returns nothing for known event	Rule missing -k key tag or wrong time range	`ausearch -ts boot -m SYSCALL \| tail` to verify events exist
Audit rules gone after reboot	Rules added via auditctl (temporary) not persisted	Move rules to /etc/audit/rules.d/*.rules and run augenrules --load
Container escape attempt not detected	No audit rules for setns(), finit_module(), or /proc//ns/	Add specific syscall and file watch rules for escape vectors

When to Use / Avoid

Use when compliance frameworks (PCI DSS, SOC 2, HIPAA) require provable tracking of who accessed what and when
Use when attribution through privilege escalation chains (sudo, su, setuid) is required
Use for detecting unauthorized modifications to critical system files (/etc/shadow, /etc/passwd, configs)
Use for container escape detection by auditing setns(), finit_module(), and namespace file access
Avoid auditing every syscall (-S all) -- generates millions of events per minute and overflows the backlog
Avoid on extreme-throughput systems where 5-10 microseconds per audited syscall is unacceptable

Try It Yourself

 1  # Check audit system status
 2  
 3  auditctl -s 2>/dev/null || echo 'auditctl not available (install auditd)'
 4  
 5  # List current audit rules
 6  
 7  sudo auditctl -l 2>/dev/null || echo 'Requires root'
 8  
 9  # Add a file watch rule (temporary, until reboot)
10  
11  sudo auditctl -w /etc/passwd -p wa -k identity_watch 2>/dev/null && echo 'Rule added' || echo 'Cannot add rule'
12  
13  # Search for recent file modification events
14  
15  sudo ausearch -k identity_watch -ts today -i 2>/dev/null | head -30 || echo 'ausearch not available'
16  
17  # Generate a summary report of all authentication events
18  
19  sudo aureport -au --summary 2>/dev/null | head -15 || echo 'aureport not available'
20  
21  # Search for all execve events by a specific user
22  
23  sudo ausearch -m EXECVE -ua 1000 -ts today -i 2>/dev/null | head -20 || echo 'ausearch not available'

Debug Checklist

1auditctl -s
2auditctl -l
3ausearch -m EXECVE -ts today -i | head -30
4aureport -x --summary
5cat /proc/self/loginuid
6ausearch -k <key_name> -ts recent

Key Takeaways

✓The audit UID (auid / loginuid) is the framework's killer feature. It is set once by PAM at login, written to /proc/self/loginuid, and never changes -- not through sudo, su, setuid, or container entry. When someone runs a destructive command as root, the auid field tells you which human actually logged in.
✓Audit rules support precise filtering: syscall number (-S), architecture (-F arch=b64), UID/GID (-F auid=1000), success/failure (-F success=0), file path (-w /etc/shadow), permissions (-p rwxa), and SELinux context (-F subj_type=httpd_t). Combine multiple filters in one rule to avoid noise.
✓Each syscall generates a multi-record event: SYSCALL (core data) + CWD (working directory) + PATH (each file touched) + PROCTITLE (command line) + optional EXECVE and SOCKADDR records. All share the same event ID. ausearch correlates them; aureport summarizes by category.
✓File watches (-w /etc/passwd -p wa -k identity) trigger on write and attribute changes using kernel hooks. They capture who changed the file, when, and from what process. This is the foundation for detecting unauthorized modifications to critical system files.
✓The audit backlog is a kernel buffer that can overflow under load. The default limit is 8192 events. When exceeded, the kernel either blocks syscalls (slowing the system), drops events (losing audit data), or panics (in high-security environments). Tuning the backlog limit and filtering rules is essential for production.

Common Pitfalls

✗Mistake: Adding overly broad rules like '-a always,exit -S all' that audit every syscall. Reality: This generates millions of events per minute, fills the log in seconds, and can overflow the kernel backlog buffer. Start with specific syscalls (execve, connect, openat) and targeted file paths.
✗Mistake: Not setting the -k (key) field on audit rules. Reality: Without keys, searching millions of events requires parsing full record content. Keys act as tags -- 'ausearch -k identity' instantly finds all events from the /etc/passwd watch. Always tag your rules.
✗Mistake: Forgetting that auditctl rules do not survive reboot. Reality: 'auditctl -w /etc/shadow -p wa' is temporary. Persistent rules go in /etc/audit/rules.d/ (e.g., 50-identity.rules). Run 'augenrules --load' to activate them.
✗Mistake: Ignoring performance impact of syscall audit rules. Reality: Each audited syscall adds about 5-10 microseconds of overhead for record generation. On a server making 100K syscalls/sec, broad rules add significant latency. Use filters (-F auid>=1000 to skip system accounts) to reduce volume.

Reference

System Calls

audit_openaudit_add_rule

Tools

ausearchaureportauditctl -l

📌

In One Line

auid traces every action back to the human who logged in -- sudo and su change the effective UID, but auid never lies.

Audit Framework & Logging

systemdKubernetes

🧠

Mental Model

💡

The Problem

Architecture

It is 3 AM. Someone runs rm -rf /var/lib/etcd/ on a production Kubernetes node. By morning, the cluster is broken and nobody is talking.

auid=1003 is Alice. She ran the command. The audit log proves it. No application-level log could have captured this with the same certainty.

What Actually Happens

The audit framework has two halves: a kernel component and a userspace daemon.

Under the Hood

Audit rules come in two flavors.

Common Questions

How does audit handle log integrity?

How does auid differ from uid in audit records?

What rules do compliance frameworks require?

How are audit logs analyzed during incident response?

How Technologies Use This

systemd

Kubernetes

Same Concept Across Tech

Concept	Docker	JVM	Node.js	Go	K8s
Syscall auditing	Host audit rules apply to all container syscalls	JNI calls generate audit events for native syscalls	N/A (Node makes syscalls, host audit captures them)	N/A (Go makes syscalls, host audit captures them)	Falco consumes audit events via NETLINK_AUDIT for pod monitoring
Identity tracking	auid persists into containers if PAM session set up	auid inherited from launching shell	auid inherited from launching shell	auid inherited from launching shell	Pod serviceAccountName maps to K8s audit; OS auid for node-level
Log integrity	Containers cannot access host /var/log/audit/	N/A (auditd is OS-level)	N/A (auditd is OS-level)	N/A (auditd is OS-level)	Forward audit logs to external SIEM (Splunk, ELK) from node
Escape detection	Audit setns(), mount(), ptrace() syscalls	N/A	N/A	N/A	Falco rules alert on namespace manipulation and module loading

Stack Layer Mapping

Layer	Component
Kernel hooks	syscall entry/exit, LSM hooks, filesystem watch triggers
kauditd	Kernel thread: evaluates rules, builds records, queues in backlog
NETLINK_AUDIT	Kernel-to-userspace transport for audit events
auditd	Writes /var/log/audit/audit.log, log rotation, remote forwarding
Analysis tools	ausearch (query), aureport (summarize), auditctl (configure)
SIEM	Splunk, ELK, Wazuh, Falco for alerting and long-term retention

If You See This, Think This

Symptom	Likely Cause	First Check
All sudo actions show uid=0, no way to attribute	auid not set -- PAM not configured with pam_loginuid.so	`cat /proc/self/loginuid` -- 4294967295 means unset
Audit log fills disk in hours	Overly broad rules auditing too many syscalls	`auditctl -l` -- remove -S all rules; use targeted syscalls
Syscalls blocked or system hangs under load	Kernel backlog overflow with failure=2 (panic)	`auditctl -s` check backlog vs backlog_limit
ausearch returns nothing for known event	Rule missing -k key tag or wrong time range	`ausearch -ts boot -m SYSCALL \| tail` to verify events exist
Audit rules gone after reboot	Rules added via auditctl (temporary) not persisted	Move rules to /etc/audit/rules.d/*.rules and run augenrules --load
Container escape attempt not detected	No audit rules for setns(), finit_module(), or /proc//ns/	Add specific syscall and file watch rules for escape vectors

When to Use / Avoid

Use when compliance frameworks (PCI DSS, SOC 2, HIPAA) require provable tracking of who accessed what and when
Use when attribution through privilege escalation chains (sudo, su, setuid) is required
Use for detecting unauthorized modifications to critical system files (/etc/shadow, /etc/passwd, configs)
Use for container escape detection by auditing setns(), finit_module(), and namespace file access
Avoid auditing every syscall (-S all) -- generates millions of events per minute and overflows the backlog
Avoid on extreme-throughput systems where 5-10 microseconds per audited syscall is unacceptable

Try It Yourself

 1  # Check audit system status
 2  
 3  auditctl -s 2>/dev/null || echo 'auditctl not available (install auditd)'
 4  
 5  # List current audit rules
 6  
 7  sudo auditctl -l 2>/dev/null || echo 'Requires root'
 8  
 9  # Add a file watch rule (temporary, until reboot)
10  
11  sudo auditctl -w /etc/passwd -p wa -k identity_watch 2>/dev/null && echo 'Rule added' || echo 'Cannot add rule'
12  
13  # Search for recent file modification events
14  
15  sudo ausearch -k identity_watch -ts today -i 2>/dev/null | head -30 || echo 'ausearch not available'
16  
17  # Generate a summary report of all authentication events
18  
19  sudo aureport -au --summary 2>/dev/null | head -15 || echo 'aureport not available'
20  
21  # Search for all execve events by a specific user
22  
23  sudo ausearch -m EXECVE -ua 1000 -ts today -i 2>/dev/null | head -20 || echo 'ausearch not available'

Debug Checklist

1auditctl -s
2auditctl -l
3ausearch -m EXECVE -ts today -i | head -30
4aureport -x --summary
5cat /proc/self/loginuid
6ausearch -k <key_name> -ts recent

Key Takeaways

✓The audit UID (auid / loginuid) is the framework's killer feature. It is set once by PAM at login, written to /proc/self/loginuid, and never changes -- not through sudo, su, setuid, or container entry. When someone runs a destructive command as root, the auid field tells you which human actually logged in.
✓Audit rules support precise filtering: syscall number (-S), architecture (-F arch=b64), UID/GID (-F auid=1000), success/failure (-F success=0), file path (-w /etc/shadow), permissions (-p rwxa), and SELinux context (-F subj_type=httpd_t). Combine multiple filters in one rule to avoid noise.
✓Each syscall generates a multi-record event: SYSCALL (core data) + CWD (working directory) + PATH (each file touched) + PROCTITLE (command line) + optional EXECVE and SOCKADDR records. All share the same event ID. ausearch correlates them; aureport summarizes by category.
✓File watches (-w /etc/passwd -p wa -k identity) trigger on write and attribute changes using kernel hooks. They capture who changed the file, when, and from what process. This is the foundation for detecting unauthorized modifications to critical system files.
✓The audit backlog is a kernel buffer that can overflow under load. The default limit is 8192 events. When exceeded, the kernel either blocks syscalls (slowing the system), drops events (losing audit data), or panics (in high-security environments). Tuning the backlog limit and filtering rules is essential for production.

Common Pitfalls

✗Mistake: Adding overly broad rules like '-a always,exit -S all' that audit every syscall. Reality: This generates millions of events per minute, fills the log in seconds, and can overflow the kernel backlog buffer. Start with specific syscalls (execve, connect, openat) and targeted file paths.
✗Mistake: Not setting the -k (key) field on audit rules. Reality: Without keys, searching millions of events requires parsing full record content. Keys act as tags -- 'ausearch -k identity' instantly finds all events from the /etc/passwd watch. Always tag your rules.
✗Mistake: Forgetting that auditctl rules do not survive reboot. Reality: 'auditctl -w /etc/shadow -p wa' is temporary. Persistent rules go in /etc/audit/rules.d/ (e.g., 50-identity.rules). Run 'augenrules --load' to activate them.
✗Mistake: Ignoring performance impact of syscall audit rules. Reality: Each audited syscall adds about 5-10 microseconds of overhead for record generation. On a server making 100K syscalls/sec, broad rules add significant latency. Use filters (-F auid>=1000 to skip system accounts) to reduce volume.

Reference

System Calls

audit_openaudit_add_rule

Tools

ausearchaureportauditctl -l

📌

In One Line

auid traces every action back to the human who logged in -- sudo and su change the effective UID, but auid never lies.

Audit Framework & Logging

Mental Model

The Problem

Architecture

What Actually Happens

Under the Hood

Common Questions

How Technologies Use This

Same Concept Across Tech

If You See This, Think This

When to Use / Avoid

Try It Yourself

Debug Checklist

Key Takeaways

Common Pitfalls

Reference

In One Line

Related Topics

Audit Framework & Logging

Mental Model

The Problem

Architecture

What Actually Happens

Under the Hood

Common Questions

How Technologies Use This

Same Concept Across Tech

If You See This, Think This

When to Use / Avoid

Try It Yourself

Debug Checklist

Key Takeaways

Common Pitfalls

Reference

In One Line

Related Topics