Containers & SecurityTopic 7 of 9

Security & Access ControlAdvanced

SELinux & AppArmor

DockerKubernetes

🧠

Mental Model

A hospital where every staff member wears a colored badge and every room door lists which badge colors may enter. The janitor carries a master keycard -- root -- that opens any standard lock, but the badge system sits on a separate circuit entirely. Keycard accepted, wrong badge color? Door stays shut. SELinux works like permanent badge stamps on every person and every door. AppArmor works like a clipboard at each door listing permitted names by room number. Neither system cares about the keycard.

💡

The Problem

An attacker exploits a web server bug and lands root. From there, DAC rolls out the red carpet: /etc/shadow is readable, every database file is open, lateral movement to other services takes under 60 seconds. On a shared-kernel host, Container A running as UID 0 reads Container B's database volume because DAC cannot tell one root from another -- 10,000+ user credentials leak. Meanwhile a 200-node Kubernetes cluster split between RHEL and Ubuntu has no unified MAC story, leaving compromised pods on unprotected nodes free to touch any host file they please.

Architecture

Root is supposed to be all-powerful. That is the whole point of root.

But on a properly configured RHEL box, a process running as root inside the httpd_t domain cannot read /etc/shadow. The kernel checks the file permissions -- root passes, of course. Then it checks the SELinux policy. No allow rule exists for httpd_t accessing shadow_t. Access denied. Root gets EACCES.

This is Mandatory Access Control. The kernel enforces rules that even root cannot override. And it is the reason a compromised web server on a MAC-enabled system is an incident, not a catastrophe.

What Actually Happens

When a process tries to open a file, the kernel runs two security checks in sequence.

First: DAC (Discretionary Access Control). Traditional file permissions. Owner, group, other. ACLs. This is the rwx check everyone knows. If DAC denies access, the operation fails immediately. MAC never even runs.

Second: MAC via LSM hooks. If DAC passes, the kernel hits a Linux Security Module hook -- security_inode_permission(). This is where SELinux or AppArmor makes its decision.

For SELinux, the hook does three things: (1) look up the process's security context (e.g., httpd_t), (2) look up the file's security context (e.g., httpd_sys_content_t), (3) check the Access Vector Cache for a rule allowing this specific access. Cache hit takes about 100 nanoseconds. Cache miss triggers a full policy database lookup.

For AppArmor, the hook looks up the profile for the current binary and checks whether the file path matches an allowed pattern. /var/www/** with read permission? Allowed. /etc/shadow? Not in the profile. Denied.

Both produce audit log entries on denial. SELinux writes AVC (Access Vector Cache) denial messages. AppArmor writes APPARMOR_DENIED messages.

Under the Hood

SELinux labels everything. Every process, file, socket, port, and IPC object carries a security context in the format user:role:type:level. The type field does the heavy lifting. Type Enforcement rules explicitly allow specific types to interact: allow httpd_t httpd_sys_content_t:file { read open getattr }. No allow rule means default deny. The entire policy is compiled into a binary blob loaded by the kernel at boot.

The AVC caches decisions in a hash table keyed by (source_type, target_type, object_class). This is critical for performance -- without the cache, every file access would require a full policy database traversal.

SELinux booleans are the admin-friendly knobs. setsebool -P httpd_can_network_connect on allows Apache to make outbound network connections. Each boolean toggles a set of policy rules without recompiling the policy. getsebool -a | grep httpd shows all httpd-related booleans.

MCS for container isolation is elegant. Container A gets s0:c1,c2. Container B gets s0:c3,c4. Files written by A are labeled s0:c1,c2. Process B has s0:c3,c4. The categories do not match, so B cannot read A's files -- even if DAC says otherwise.

AppArmor takes the simpler path. Profiles are human-readable text files that list allowed paths with glob patterns, capabilities, and network access rules. A profile for nginx might allow read access to /var/www/**, write to /var/log/nginx/**, and the net_bind_service capability. No labels, no contexts, no type enforcement rules. The tradeoff: hard links bypass path-based rules, and path canonicalization has edge cases.

AppArmor profiles have three modes: enforce (deny and log), complain (allow but log), and kill (SIGKILL on violation). The development workflow: start in complain, run the application through all its code paths, use aa-logprof to generate rules from logs, review, then switch to enforce.

Common Questions

How does SELinux handle a root process trying to read /etc/shadow?

Root has CAP_DAC_OVERRIDE, so the DAC check passes. Then the LSM hook fires. SELinux looks for allow httpd_t shadow_t:file read in the policy. No such rule exists. Access denied with an AVC denial message in the audit log. The process sees EACCES despite running as root. Constraining root is the fundamental value of MAC.

Why do organizations disable SELinux?

Complexity. When a new application fails with AVC denials, the quick fix is setenforce 0. The right fix is: use audit2why to understand the denial, audit2allow -M mymodule to generate a targeted policy module, test in permissive mode, then switch to enforcing. Organizations that invest in SELinux expertise get one of the strongest host security mechanisms available.

Can SELinux and AppArmor run at the same time?

Since kernel 5.4, multiple LSMs can stack via the lsm= boot parameter. But running both simultaneously is not recommended -- their labeling and path-based models conflict, making debugging extremely difficult. In practice, distributions choose one: RHEL/Fedora use SELinux, Ubuntu/Debian use AppArmor.

How does SELinux handle temporary files?

Type transitions. When httpd_t creates a file in a directory labeled httpd_tmp_t, a transition rule automatically labels the new file as httpd_tmp_t. Without transition rules, new files inherit the parent directory's type, which may be too permissive. restorecon relabels files to their default context based on path patterns.

How Technologies Use This

Docker

Container A reads Container B's database files because both run as UID 0 with identical Unix permissions. A compromised container traverses to another's storage volume, and standard DAC checks pass because root has CAP_DAC_OVERRIDE. Nothing in the permission model prevents cross-container data access.

The fundamental gap is that Unix DAC permissions are identity-based -- root is root, and there is no distinction between root in Container A and root in Container B. Without Mandatory Access Control, the kernel cannot enforce isolation between two processes that share the same UID and pass the same permission checks.

On SELinux systems, Docker assigns each container a unique MCS category pair like s0:c100,c200. Files written by Container A carry that label, and Container B running with s0:c300,c400 gets EACCES regardless of Unix permissions. On Ubuntu, the docker-default AppArmor profile denies mount, ptrace, and raw network access. The AVC cache resolves these checks in about 100ns, adding negligible overhead while eliminating an entire class of cross-container data leaks.

Kubernetes

A compromised pod running as root reads /etc/shadow and accesses any file on the node. The cluster has 200 nodes, half running RHEL with SELinux and the other half Ubuntu with AppArmor, and there is no unified way to enforce mandatory access control across both.

The challenge is that SELinux and AppArmor use fundamentally different models -- labels vs paths -- and operators would need separate security policies per distro. Without a unified abstraction at the Kubernetes layer, enforcing MAC consistently across a heterogeneous cluster is impractical.

Kubernetes exposes seLinuxOptions in the pod security context for SELinux nodes and AppArmor profile annotations for Ubuntu nodes. Pod Security Standards at the restricted tier require every pod to run under a confined MAC profile from whichever system is active. On SELinux nodes, each pod gets unique MCS categories preventing cross-pod file access. On AppArmor nodes, the runtime-default profile blocks mount and ptrace. Either way, a compromised pod cannot read /etc/shadow even as root, reducing post-exploitation damage by over 80%.

Same Concept Across Tech

Concept	Docker	JVM	Node.js	Go	K8s
MAC confinement	SELinux MCS labels per container; AppArmor docker-default profile	JVM runs under container's MAC domain; no JVM-specific policy needed	Node runs under container's MAC domain; native addons may trigger denials	Go binaries under container MAC domain; static linking reduces denial surface	seLinuxOptions and appArmorProfile in securityContext per pod
Container isolation	MCS categories (s0:c100,c200) prevent cross-container file access	N/A -- JVM does not interact with MAC directly	N/A -- Node does not interact with MAC directly	N/A -- Go does not interact with MAC directly	Pod Security Standards restricted tier requires confined MAC profile
Policy development	docker run with custom --security-opt label or apparmor profile	strace + audit2allow to build policy for JVM syscall patterns	complain mode + aa-logprof for Node's file access patterns	audit2allow for Go binary's minimal syscall set	Kubernetes security profiles operator for automated profile generation
Debugging denials	ausearch -m AVC for SELinux; dmesg for AppArmor	AVC denials show domain_t accessing target_t -- map to JVM file paths	APPARMOR_DENIED in dmesg shows blocked path -- map to Node require() paths	Same debugging tools; Go's direct syscalls produce cleaner denial logs	kubectl logs + node audit logs for MAC denial correlation

Stack Layer	Mechanism
Application	Operates transparently -- MAC decisions happen in kernel without app awareness
Container runtime	Assigns SELinux MCS categories or loads AppArmor profiles before exec
LSM framework	200+ hook points in VFS, networking, IPC, capabilities invoke registered MAC modules
SELinux engine	Loads compiled binary policy at boot; AVC cache handles most decisions in ~100ns
AppArmor engine	Compiles path-based profiles at load time; matches file paths against glob patterns

Design rationale: MAC exists because DAC is identity-based, and once an attacker becomes root, identity-based checks are meaningless. A kernel-enforced policy layer that operates independently of uid/gid means root inside httpd_t still cannot read shadow_t. SELinux went with labels because they survive renames and hard links -- completeness at the cost of complexity. AppArmor went with paths because administrators can read and write profiles in minutes -- simplicity at the cost of edge cases around hard links and path canonicalization.

If You See This, Think This

Symptom	Likely Cause	First Check
Application works as root but fails with EACCES	SELinux type enforcement blocking access	ausearch -m AVC -ts today
File accessible by path but not after mv	mv preserves source SELinux label; destination context wrong	ls -Z on the file; run restorecon -Rv on the directory
Container A can read Container B's files	MCS categories not assigned or identical for both containers	ps -eZ to compare container MCS labels
AppArmor profile breaks after directory restructuring	Path-based rules no longer match new file locations	aa-logprof to update profile from denial logs
New service fails immediately after deployment on RHEL	No SELinux policy module for the service; default deny blocks everything	setenforce 0 temporarily to confirm; then audit2allow -M to build policy
Hard link bypasses AppArmor file restriction	AppArmor matches paths, not inodes; hard link creates new path	Verify with ls -li; consider SELinux for inode-level enforcement

When to Use / Avoid

Defense-in-depth beyond DAC and capabilities -- MAC is what stops root-level lateral movement
Isolating containers from each other via SELinux MCS categories on shared-kernel hosts
Compliance mandates requiring Mandatory Access Control (PCI-DSS, HIPAA, FedRAMP)
Confining services to least-privilege file, network, and capability access
Never disable SELinux entirely to fix an app failure -- switch to permissive and use audit2allow
Avoid AppArmor during rapid prototyping where file layouts change constantly, since path rules break on reorganization

Try It Yourself

 1  # Check which MAC system is active
 2  
 3  cat /sys/kernel/security/lsm 2>/dev/null || echo 'LSM info not available'
 4  
 5  # SELinux: Show current mode and file contexts
 6  
 7  getenforce 2>/dev/null && ls -Z /etc/passwd 2>/dev/null || echo 'SELinux not available'
 8  
 9  # SELinux: Search for AVC denials
10  
11  ausearch -m AVC -ts today 2>/dev/null | head -20 || echo 'ausearch not available'
12  
13  # SELinux: Generate policy from denials
14  
15  ausearch -m AVC -ts today 2>/dev/null | audit2allow 2>/dev/null | head -10 || echo 'audit2allow not available'
16  
17  # AppArmor: Show loaded profiles and their status
18  
19  aa-status 2>/dev/null | head -20 || echo 'AppArmor not available'
20  
21  # AppArmor: Show profile for a specific binary
22  
23  cat /etc/apparmor.d/usr.sbin.nginx 2>/dev/null | head -20 || echo 'No nginx AppArmor profile found'

Debug Checklist

1cat /sys/kernel/security/lsm -- check which LSM is active on this system
2getenforce -- check SELinux mode (Enforcing/Permissive/Disabled)
3ausearch -m AVC -ts today | head -20 -- find recent SELinux denials
4aa-status -- show loaded AppArmor profiles and their enforcement mode
5ls -Z /path/to/file -- view SELinux security context on a file
6ps -eZ | grep $PROCESS -- view SELinux domain of a running process

Key Takeaways

✓SELinux labels every object (inode-level); AppArmor matches on file paths. This means SELinux survives file renames and hard links (the label stays on the inode), while AppArmor rules break when paths change. The tradeoff: AppArmor profiles are dramatically simpler to write and understand.
✓SELinux's type enforcement is default-deny. A rule like 'allow httpd_t httpd_sys_content_t:file { read open getattr }' explicitly permits Apache to read web content files. Without that rule, the access is silently blocked. Every allowed action must be declared.
✓MCS (Multi-Category Security) is how container runtimes use SELinux for isolation. Each container gets a unique category pair like s0:c1,c2. Files written by that container are labeled with the same categories. Another container with s0:c3,c4 cannot read them, even with correct DAC permissions.
✓AppArmor profiles support file globs (/var/www/** for recursive), owner conditionals, and capability lists. A profile for nginx: allow read /var/www/**, allow write /var/log/nginx/**, deny /etc/shadow, network inet tcp. Compilation happens at profile load time, not on every access.
✓Setting SELinux to permissive mode (setenforce 0) logs violations without blocking them -- essential for debugging 'why does my app fail?' But permissive is NOT a security posture. It is a diagnostic tool. Production must run enforcing.

Common Pitfalls

✗Mistake: Disabling SELinux entirely because an application fails. Reality: This removes a critical security layer. Use 'audit2allow' to generate policy rules from AVC denials, review them, and apply. The denial messages tell you exactly what rule is missing.
✗Mistake: Assuming AppArmor path rules apply to hard links. Reality: If a confined process creates a hard link to /etc/shadow at /tmp/shadow_copy, the rule denying /etc/shadow does not apply to the new path. SELinux handles this correctly because the label is on the inode, not the path.
✗Mistake: Moving files instead of copying them and wondering why SELinux breaks. Reality: 'mv' preserves the source label. 'cp' inherits the destination directory's default context. A config file moved from /tmp to /etc/httpd/ keeps its tmp_t label, and httpd cannot read it. Fix with 'restorecon -Rv /etc/httpd/'.
✗Mistake: Writing overly broad AppArmor profiles (allowing /** rw) to avoid breakage. Reality: This defeats the purpose of MAC entirely. Start in complain mode ('aa-complain /path/to/profile'), exercise the application, use 'aa-logprof' to generate tight rules from the logs, then switch to enforce.

Reference

System Calls

getconsetconsecurity_compute_av

Tools

audit2allow / audit2whysesearch / seinfoaa-status / aa-logprof

📌

In One Line

Turn on MAC -- SELinux on RHEL, AppArmor on Ubuntu -- so that root is no longer a skeleton key and post-compromise lateral movement hits a wall.

SELinux & AppArmor

DockerKubernetes

🧠

Mental Model

💡

The Problem

Architecture

Root is supposed to be all-powerful. That is the whole point of root.

This is Mandatory Access Control. The kernel enforces rules that even root cannot override. And it is the reason a compromised web server on a MAC-enabled system is an incident, not a catastrophe.

What Actually Happens

When a process tries to open a file, the kernel runs two security checks in sequence.

Second: MAC via LSM hooks. If DAC passes, the kernel hits a Linux Security Module hook -- security_inode_permission(). This is where SELinux or AppArmor makes its decision.

Both produce audit log entries on denial. SELinux writes AVC (Access Vector Cache) denial messages. AppArmor writes APPARMOR_DENIED messages.

Under the Hood

Common Questions

How does SELinux handle a root process trying to read /etc/shadow?

Why do organizations disable SELinux?

Can SELinux and AppArmor run at the same time?

How does SELinux handle temporary files?

How Technologies Use This

Docker

Kubernetes

Same Concept Across Tech

Concept	Docker	JVM	Node.js	Go	K8s
MAC confinement	SELinux MCS labels per container; AppArmor docker-default profile	JVM runs under container's MAC domain; no JVM-specific policy needed	Node runs under container's MAC domain; native addons may trigger denials	Go binaries under container MAC domain; static linking reduces denial surface	seLinuxOptions and appArmorProfile in securityContext per pod
Container isolation	MCS categories (s0:c100,c200) prevent cross-container file access	N/A -- JVM does not interact with MAC directly	N/A -- Node does not interact with MAC directly	N/A -- Go does not interact with MAC directly	Pod Security Standards restricted tier requires confined MAC profile
Policy development	docker run with custom --security-opt label or apparmor profile	strace + audit2allow to build policy for JVM syscall patterns	complain mode + aa-logprof for Node's file access patterns	audit2allow for Go binary's minimal syscall set	Kubernetes security profiles operator for automated profile generation
Debugging denials	ausearch -m AVC for SELinux; dmesg for AppArmor	AVC denials show domain_t accessing target_t -- map to JVM file paths	APPARMOR_DENIED in dmesg shows blocked path -- map to Node require() paths	Same debugging tools; Go's direct syscalls produce cleaner denial logs	kubectl logs + node audit logs for MAC denial correlation

Stack Layer	Mechanism
Application	Operates transparently -- MAC decisions happen in kernel without app awareness
Container runtime	Assigns SELinux MCS categories or loads AppArmor profiles before exec
LSM framework	200+ hook points in VFS, networking, IPC, capabilities invoke registered MAC modules
SELinux engine	Loads compiled binary policy at boot; AVC cache handles most decisions in ~100ns
AppArmor engine	Compiles path-based profiles at load time; matches file paths against glob patterns

If You See This, Think This

Symptom	Likely Cause	First Check
Application works as root but fails with EACCES	SELinux type enforcement blocking access	ausearch -m AVC -ts today
File accessible by path but not after mv	mv preserves source SELinux label; destination context wrong	ls -Z on the file; run restorecon -Rv on the directory
Container A can read Container B's files	MCS categories not assigned or identical for both containers	ps -eZ to compare container MCS labels
AppArmor profile breaks after directory restructuring	Path-based rules no longer match new file locations	aa-logprof to update profile from denial logs
New service fails immediately after deployment on RHEL	No SELinux policy module for the service; default deny blocks everything	setenforce 0 temporarily to confirm; then audit2allow -M to build policy
Hard link bypasses AppArmor file restriction	AppArmor matches paths, not inodes; hard link creates new path	Verify with ls -li; consider SELinux for inode-level enforcement

When to Use / Avoid

Defense-in-depth beyond DAC and capabilities -- MAC is what stops root-level lateral movement
Isolating containers from each other via SELinux MCS categories on shared-kernel hosts
Compliance mandates requiring Mandatory Access Control (PCI-DSS, HIPAA, FedRAMP)
Confining services to least-privilege file, network, and capability access
Never disable SELinux entirely to fix an app failure -- switch to permissive and use audit2allow
Avoid AppArmor during rapid prototyping where file layouts change constantly, since path rules break on reorganization

Try It Yourself

 1  # Check which MAC system is active
 2  
 3  cat /sys/kernel/security/lsm 2>/dev/null || echo 'LSM info not available'
 4  
 5  # SELinux: Show current mode and file contexts
 6  
 7  getenforce 2>/dev/null && ls -Z /etc/passwd 2>/dev/null || echo 'SELinux not available'
 8  
 9  # SELinux: Search for AVC denials
10  
11  ausearch -m AVC -ts today 2>/dev/null | head -20 || echo 'ausearch not available'
12  
13  # SELinux: Generate policy from denials
14  
15  ausearch -m AVC -ts today 2>/dev/null | audit2allow 2>/dev/null | head -10 || echo 'audit2allow not available'
16  
17  # AppArmor: Show loaded profiles and their status
18  
19  aa-status 2>/dev/null | head -20 || echo 'AppArmor not available'
20  
21  # AppArmor: Show profile for a specific binary
22  
23  cat /etc/apparmor.d/usr.sbin.nginx 2>/dev/null | head -20 || echo 'No nginx AppArmor profile found'

Debug Checklist

1cat /sys/kernel/security/lsm -- check which LSM is active on this system
2getenforce -- check SELinux mode (Enforcing/Permissive/Disabled)
3ausearch -m AVC -ts today | head -20 -- find recent SELinux denials
4aa-status -- show loaded AppArmor profiles and their enforcement mode
5ls -Z /path/to/file -- view SELinux security context on a file
6ps -eZ | grep $PROCESS -- view SELinux domain of a running process

Key Takeaways

✓SELinux labels every object (inode-level); AppArmor matches on file paths. This means SELinux survives file renames and hard links (the label stays on the inode), while AppArmor rules break when paths change. The tradeoff: AppArmor profiles are dramatically simpler to write and understand.
✓SELinux's type enforcement is default-deny. A rule like 'allow httpd_t httpd_sys_content_t:file { read open getattr }' explicitly permits Apache to read web content files. Without that rule, the access is silently blocked. Every allowed action must be declared.
✓MCS (Multi-Category Security) is how container runtimes use SELinux for isolation. Each container gets a unique category pair like s0:c1,c2. Files written by that container are labeled with the same categories. Another container with s0:c3,c4 cannot read them, even with correct DAC permissions.
✓AppArmor profiles support file globs (/var/www/** for recursive), owner conditionals, and capability lists. A profile for nginx: allow read /var/www/**, allow write /var/log/nginx/**, deny /etc/shadow, network inet tcp. Compilation happens at profile load time, not on every access.
✓Setting SELinux to permissive mode (setenforce 0) logs violations without blocking them -- essential for debugging 'why does my app fail?' But permissive is NOT a security posture. It is a diagnostic tool. Production must run enforcing.

Common Pitfalls

✗Mistake: Disabling SELinux entirely because an application fails. Reality: This removes a critical security layer. Use 'audit2allow' to generate policy rules from AVC denials, review them, and apply. The denial messages tell you exactly what rule is missing.
✗Mistake: Assuming AppArmor path rules apply to hard links. Reality: If a confined process creates a hard link to /etc/shadow at /tmp/shadow_copy, the rule denying /etc/shadow does not apply to the new path. SELinux handles this correctly because the label is on the inode, not the path.
✗Mistake: Moving files instead of copying them and wondering why SELinux breaks. Reality: 'mv' preserves the source label. 'cp' inherits the destination directory's default context. A config file moved from /tmp to /etc/httpd/ keeps its tmp_t label, and httpd cannot read it. Fix with 'restorecon -Rv /etc/httpd/'.
✗Mistake: Writing overly broad AppArmor profiles (allowing /** rw) to avoid breakage. Reality: This defeats the purpose of MAC entirely. Start in complain mode ('aa-complain /path/to/profile'), exercise the application, use 'aa-logprof' to generate tight rules from the logs, then switch to enforce.

Reference

System Calls

getconsetconsecurity_compute_av

Tools

audit2allow / audit2whysesearch / seinfoaa-status / aa-logprof

📌

In One Line

Turn on MAC -- SELinux on RHEL, AppArmor on Ubuntu -- so that root is no longer a skeleton key and post-compromise lateral movement hits a wall.

SELinux & AppArmor

Mental Model

The Problem

Architecture

What Actually Happens

Under the Hood

Common Questions

How Technologies Use This

Same Concept Across Tech

If You See This, Think This

When to Use / Avoid

Try It Yourself

Debug Checklist

Key Takeaways

Common Pitfalls

Reference

In One Line

Related Topics

SELinux & AppArmor

Mental Model

The Problem

Architecture

What Actually Happens

Under the Hood

Common Questions

How Technologies Use This

Same Concept Across Tech

If You See This, Think This

When to Use / Avoid

Try It Yourself

Debug Checklist

Key Takeaways

Common Pitfalls

Reference

In One Line

Related Topics