Kernel Livepatching & Runtime Code Replacement
Mental Model
A highway overpass under repair. Traffic cannot stop. Engineers build the replacement section in a workshop off-site. At deployment time, they do not demolish the old section. They install a detour sign at the entrance that redirects every car to the new section. Cars already on the old section finish their trip normally. Once every car has exited the old section, it is effectively retired. The highway never closes. Drivers never stop.
The Problem
A critical CVE disclosed on a Friday afternoon. 500 production servers need patching. A traditional kernel update requires rebooting each one -- draining connections, migrating workloads, cycling through maintenance windows. Livepatching applies the fix in seconds with no downtime. The replacement function loads as a module, ftrace redirects calls, and the vulnerability is gone before anyone finishes reading the advisory.
Architecture
A critical CVE drops. Hundreds of production servers are vulnerable. The fix is a three-line change in a kernel function. A traditional kernel update means rebooting every server -- draining connections, migrating workloads, cycling through maintenance windows. For a fleet of 500 servers, that process takes days.
Kernel livepatching applies that three-line fix to the running kernel in under a second. No reboot. No downtime. No connection drops. The vulnerable function is replaced in place while every process keeps running.
This is not hot reloading or dynamic linking. It is surgical function replacement at the lowest level of the operating system, using the same ftrace infrastructure that powers kernel tracing tools.
How It Works
The livepatch framework replaces kernel functions at runtime by hijacking their entry points. Every kernel function compiled with -mfentry starts with a call to __fentry__, a no-op trampoline that ftrace can intercept. The livepatch framework registers an ftrace handler on the target function that modifies the return address on the stack, redirecting execution to the replacement function.
Here is the sequence:
-
Build phase.
kpatch-buildcompiles the original and patched kernel source. It compares the resulting object files at the function level, identifies which functions changed, and packages them into a kernel module (.ko file) that uses the livepatch API. -
Load phase. Loading the module calls
klp_enable_patch(). The framework resolves symbol names via kallsyms, verifies the target functions exist, and registers ftrace hooks for each replacement. At this point, the hooks are installed but not all tasks are running the new code. -
Transition phase. The framework sets
TIF_PATCH_PENDINGon every task in the system. As each task returns to userspace (or enters the idle loop), the flag is cleared and the task switches to the "new universe" where it will use patched functions. The patch is fully active when all tasks have transitioned. -
Steady state. Every call to a patched function hits the ftrace hook and gets redirected. The original function code still exists in memory but is never executed. Overhead is minimal -- a few nanoseconds per redirected call.
The Consistency Model
This is the hardest part of livepatching and the reason naive function replacement is dangerous.
Consider two functions: validate_packet() and process_packet(). A security fix changes both. If task A calls the old validate_packet() and then the new process_packet(), the system is in an inconsistent state. The new process_packet() might expect validation guarantees that the old validate_packet() does not provide.
The livepatch consistency model prevents this by maintaining two "universes" -- the old function set and the new function set. Each task is in exactly one universe. The transition happens per-task at safe points:
- Return to userspace from a system call
- Entry to the idle loop
- Signal delivery
At these points, the task has no kernel stack frames that reference the old functions. It is safe to switch. The task atomically moves to the new universe and all subsequent kernel function calls use the patched versions.
The /sys/kernel/livepatch/<patch>/transition file reads 1 during this migration and 0 once every task has switched. Long-running kernel threads that never return to userspace (like certain kworkers) can delay this transition.
Building a Livepatch
The kpatch-build tool automates the entire process:
# Install dependencies (RHEL/CentOS)
yum install kpatch kpatch-build kernel-devel-$(uname -r) kernel-debuginfo-$(uname -r)
# Build a livepatch module from a source patch
kpatch-build -s /usr/src/kernels/$(uname -r) cve-2024-xxxx.patch
# The output is a .ko file
ls livepatch-cve2024xxxx.ko
Under the hood, kpatch-build does the following:
- Extracts the kernel source corresponding to the running kernel.
- Compiles it once (original).
- Applies the patch and compiles again (patched).
- Runs
create-diff-objectto compare each object file and extract changed functions. - Links the changed functions into a kernel module that calls
klp_enable_patch()on load.
The output module is a standard .ko file. It can be signed, distributed via RPM, and managed with systemd services.
Applying and Managing Livepatches
# Load the livepatch
kpatch load livepatch-cve2024xxxx.ko
# Verify it is active
kpatch list
# Expected output:
# livepatch_cve2024xxxx [enabled]
# Check via sysfs
cat /sys/kernel/livepatch/livepatch_cve2024xxxx/enabled
# 1
# Check transition status (0 = complete, 1 = in progress)
cat /sys/kernel/livepatch/livepatch_cve2024xxxx/transition
# 0
# Disable the patch (revert to original functions)
echo 0 > /sys/kernel/livepatch/livepatch_cve2024xxxx/enabled
# Re-enable
echo 1 > /sys/kernel/livepatch/livepatch_cve2024xxxx/enabled
# Unload (must disable first)
kpatch unload livepatch-cve2024xxxx
What Cannot Be Livepatched
Livepatching is function replacement. It swaps one function body for another at the same entry point. This means several classes of changes are out of reach:
Data structure changes. Adding a field to struct sock or changing the size of a hash table affects every instance of that structure already in memory. Livepatching cannot retroactively modify existing allocations.
Function signature changes. If the fix changes the number or type of parameters, every call site would need updating. Livepatching only replaces the callee, not the callers.
Inline functions. Functions marked __always_inline or automatically inlined by the compiler have no single entry point to hook. The code is duplicated at each call site.
Init/exit paths. Functions that run only during boot or module initialization have already executed. Replacing them after the fact has no effect.
Assembly code. Hand-written assembly in .S files does not go through the C compiler and lacks __fentry__ prologues.
In practice, roughly 60-70% of kernel CVE fixes consist of self-contained function body changes that are livepatchable. The remaining 30-40% require traditional kernel updates and reboots.
Under the Hood: ftrace Integration
The ftrace framework is the backbone of livepatching. Here is how the redirection works at the instruction level on x86-64:
Every function compiled with -mfentry begins with:
vulnerable_func:
call __fentry__ # 5-byte NOP when ftrace is inactive
push rbp
mov rbp, rsp
...
When ftrace is inactive, __fentry__ is patched to a NOP (no operation). When the livepatch framework registers a hook on vulnerable_func, ftrace replaces the NOP with a call to the ftrace trampoline. The trampoline checks the registered handlers and, for livepatching, modifies the saved return address on the stack to point to patched_func instead of the instruction after call __fentry__.
The result: the caller thinks it called vulnerable_func, but execution lands in patched_func. The replacement function has the same signature, operates on the same arguments, and returns to the same caller. The substitution is invisible.
Atomic Replace
Before kernel 5.1, livepatches were cumulative. If patch-1 replaced functions A and B, and patch-2 needed to fix function A differently plus add a fix for function C, patch-2 had to contain: the new fix for A, the same fix for B (from patch-1), and the new fix for C. Every new patch carried the full history of all previous patches.
Atomic replace (the replace flag in struct klp_patch) changed this. A patch with replace = true tells the framework: "this patch is the complete set of all needed replacements. Disable all previous patches and use only this one." This simplifies patch management enormously for vendors maintaining long-lived kernel branches with dozens of accumulated livepatches.
Common Questions
How much overhead does livepatching add?
The overhead is the cost of one ftrace handler invocation per call to a patched function. On modern x86-64 hardware, this is 3-7 nanoseconds per call. For a function called millions of times per second (like network packet processing), this adds up to low single-digit milliseconds per second -- measurable but rarely significant. For most patched functions (security fixes to rarely-hit error paths), the overhead is effectively zero.
Can a livepatch be reverted?
Yes. Writing 0 to /sys/kernel/livepatch/<patch>/enabled triggers a reverse transition. The framework sets TIF_PATCH_PENDING on all tasks again and transitions them back to the old universe. Once all tasks have switched, the ftrace hooks are removed and the original functions execute directly. The module can then be unloaded with rmmod.
What happens if the system reboots after a livepatch is applied?
The livepatch is gone. It was a kernel module loaded into the running kernel's memory. On reboot, the original (unpatched) kernel loads. The kpatch service can be configured to re-apply patches on boot, but the correct long-term fix is installing an updated kernel package.
How do distributions verify that a livepatch is correct?
Red Hat, SUSE, and Canonical all maintain dedicated livepatching teams that build and test patches against their supported kernels. The testing includes: building the module, loading it on test systems, running the CVE reproducer to confirm the fix, running regression test suites, and stress-testing the transition mechanism. Each vendor signs the livepatch modules with their kernel signing key for Secure Boot compatibility.
How Technologies Use This
CVE-2024-1086, a use-after-free in the netfilter nf_tables component, allows container escape from any Docker container to host root. A production fleet of 800 Docker hosts running RHEL 9 is vulnerable. Each host runs 30 to 50 containers serving live traffic. A traditional kernel update requires draining every container off each host, rebooting, and waiting for containers to reschedule. At 5 minutes per host with rolling restarts, the full fleet takes 66 hours.
Livepatching replaces the vulnerable nft_verdict_init() function at runtime. kpatch-build compiles the original kernel source and the patched source, diffs the object files, and extracts the changed function into a .ko module. Loading the module with kpatch load registers the replacement function with the livepatch framework. ftrace intercepts every call to the original nft_verdict_init() at its __fentry__ prologue and redirects execution to the patched version. The container escape path is closed in under one second per host, with zero container restarts.
An Ansible playbook pushes the .ko module to all 800 hosts and runs kpatch load in parallel. The entire fleet is patched in 4 minutes. No container sees a connection drop. No load balancer drain is needed. The livepatch stays active until the next maintenance window, when a full kernel RPM update makes the fix permanent and the livepatch module is unloaded.
A 500-node Kubernetes cluster running on bare-metal RHEL 9 nodes receives a critical kernel security advisory. The vulnerability is in the cgroup v2 cpu controller, allowing a pod with CAP_SYS_ADMIN to escalate to host root. Rolling reboots of the cluster mean cordoning each node, draining its pods (2 to 8 minutes depending on graceful termination periods), rebooting (90 seconds), and waiting for the kubelet to rejoin and pods to reschedule. At 10 minutes per node, the full cluster takes 83 hours to patch with rolling reboots, or 3.5 days.
Livepatching eliminates the drain-reboot-reschedule cycle entirely. The security team builds the livepatch module from the RHEL kernel source diff, tests it on a staging node, then distributes it via Ansible to all 500 nodes simultaneously. Each node runs kpatch load, which calls klp_enable_patch() to register ftrace hooks on the vulnerable cgroup functions. The per-task consistency model transitions each running process to the new function universe as it returns to userspace. Within 30 seconds, the /sys/kernel/livepatch/<patch>/transition file reads 0 on every node, confirming all tasks are running patched code.
Total time from advisory to full fleet patched: 12 minutes, including build, test, and rollout. No pod was evicted. No PodDisruptionBudget was violated. No StatefulSet had to restart. The cluster continued serving traffic at full capacity throughout the operation. On the next scheduled maintenance window, a rolling kernel update replaces the livepatch with the permanent fix.
A PostgreSQL 16 primary server on RHEL 9 handles 12,000 transactions per second for a financial trading platform. The host kernel has a bug in the ext4 journaling path that causes data corruption during fsync() under specific I/O patterns. PostgreSQL relies on fsync() for WAL durability, and the corruption has caused two unclean recoveries in the past month. The database cannot be taken offline for a kernel reboot without a planned failover to a replica, which requires a 15-minute maintenance window and coordination across three teams.
The ext4 bug is a function-body-only fix in ext4_journal_submit_inode_data_buffers(), making it a candidate for livepatching. The operations team builds the livepatch module, loads it with kpatch load, and the ftrace hook redirects all calls to the fixed function. PostgreSQL continues running throughout. The WAL writer, checkpointer, and all backend processes transition to the patched kernel code as they return from their next system call. Within 5 seconds, the transition file reads 0 and every database process is executing the corrected fsync path.
The database stays online at full throughput during the entire operation. No failover to the replica is needed. No client connections are dropped. The trading platform continues processing transactions while the kernel bug that threatened data integrity is eliminated. The permanent kernel update is scheduled for the next quarterly maintenance window, when the planned failover can proceed on the team's own timeline rather than under emergency pressure.
Same Concept Across Tech
| Technology | How it uses livepatching | Key limitation |
|---|---|---|
| RHEL (kpatch) | kpatch-build generates .ko modules from source diffs. kpatch load applies them. Ansible distributes fleet-wide | Cannot patch data structure changes. Cumulative patches must include all previous fixes |
| SUSE (kGraft/livepatch) | Originally kGraft with lazy hybrid switching. Now uses upstream livepatch framework. SUSE Live Patching service delivers patches | Same function-body-only limitation. kGraft's original consistency model was replaced upstream |
| Ubuntu (Canonical Livepatch) | Snap-based client receives pre-built livepatches from Canonical. Automatic application, no manual intervention | Covers only Canonical-maintained kernels. Custom kernels need custom livepatch builds |
| Cloud Providers | Hypervisor kernel patching without VM migration. Patch thousands of hosts in minutes instead of days | Hypervisor-specific patches must avoid data structure changes that affect VM state |
| Embedded / IoT | Devices that cannot be rebooted (medical, automotive, industrial control) use livepatching for security updates | Limited by device kernel configuration. Many embedded kernels lack ftrace support |
Stack layer mapping (livepatch not activating):
| Layer | What to check | Tool |
|---|---|---|
| Kernel config | Is CONFIG_LIVEPATCH=y in the running kernel? | grep CONFIG_LIVEPATCH /boot/config-$(uname -r) |
| ftrace | Is ftrace enabled and functional? | cat /proc/sys/kernel/ftrace_enabled |
| Module loading | Did the livepatch module load without errors? | dmesg |
| Consistency | Are all tasks transitioned to the new universe? | cat /sys/kernel/livepatch/<patch>/transition |
| Task blocking | Which tasks are stuck on the old universe? | for pid in /proc/[0-9]*; do cat $pid/patch_state 2>/dev/null; done |
Design Rationale The original kpatch (Red Hat) and kGraft (SUSE) approaches differed in their consistency models. kpatch used stop_machine() to freeze all CPUs and switch atomically -- simple but caused latency spikes. kGraft used lazy per-task switching -- no freeze but complex to reason about. The upstream livepatch framework (merged in kernel 4.0) adopted kGraft's per-task model but simplified it. Since kernel 4.12, the consistency model checks tasks on return to userspace, providing both safety and low latency. The atomic replace feature (kernel 5.1) solved the cumulative patch problem by allowing a single patch to supersede all previous ones.
If You See This, Think This
| Symptom | Likely cause | First check |
|---|---|---|
| kpatch load fails with "invalid argument" | Kernel built without CONFIG_LIVEPATCH or ftrace disabled | grep CONFIG_LIVEPATCH /boot/config-$(uname -r) |
| Patch loaded but transition stays at 1 | Long-running kernel thread stuck in kernel space, never reaching a safe transition point | cat /sys/kernel/livepatch/<patch>/transition and check for blocked tasks |
| System panic after loading livepatch | Patch built against wrong kernel version or ABI mismatch | Verify patch was built for the exact running kernel: uname -r vs patch metadata |
| kpatch-build fails during compilation | Kernel source tree does not match running kernel, or missing build dependencies | Verify kernel-devel package matches uname -r exactly |
| Multiple livepatches conflict | Stacked patches modifying overlapping functions without atomic replace | Use atomic replace mode (kernel 5.1+) or build cumulative patches |
| Performance degradation after livepatching | Excessive ftrace hooks adding overhead to hot-path functions | perf top to identify if patched functions show unusual overhead |
When to Use / Avoid
Relevant when:
- A critical CVE requires immediate patching on production servers that cannot tolerate downtime
- Managing large fleets (hundreds or thousands of hosts) where rolling reboots take days
- Running hypervisors where rebooting means live-migrating all VMs off the host first
- Operating in regulated environments with strict SLA uptime requirements (99.99%+)
Watch out for:
- Data structure changes and function signature modifications cannot be livepatched
- Long-running kernel threads may delay patch transition (transition stays at 1)
- Livepatches should be treated as temporary -- schedule a real kernel update within the next maintenance window
- Stacking multiple livepatches without atomic replace creates hard-to-debug interaction bugs
Try It Yourself
1 # Check if the running kernel supports livepatching
2
3 grep CONFIG_LIVEPATCH /boot/config-$(uname -r)
4
5 # List all currently loaded livepatches
6
7 ls -la /sys/kernel/livepatch/
8
9 # Check the status of a specific livepatch
10
11 cat /sys/kernel/livepatch/livepatch_cve2024xxxx/enabled && cat /sys/kernel/livepatch/livepatch_cve2024xxxx/transition
12
13 # Load a livepatch module with kpatch
14
15 kpatch load livepatch-cve2024xxxx.ko
16
17 # List all kpatch-managed patches and their states
18
19 kpatch list
20
21 # Disable a livepatch (revert to original functions) without unloading
22
23 echo 0 > /sys/kernel/livepatch/livepatch_cve2024xxxx/enabled
24
25 # Force-complete a stalled transition (dangerous, last resort)
26
27 echo 0 > /sys/kernel/livepatch/livepatch_cve2024xxxx/transition
28
29 # Check ftrace status (prerequisite for livepatching)
30
31 cat /proc/sys/kernel/ftrace_enabled
32
33 # View which functions a livepatch replaces
34
35 find /sys/kernel/livepatch/ -name new_func -exec cat {} \;
36
37 # Monitor livepatch kernel messages in real time
38
39 dmesg -w | grep livepatch
40
41 # Check task patch state (0=old universe, 1=new universe)
42
43 for pid in $(ls /proc | grep -E '^[0-9]+$' | head -20); do echo -n "PID $pid: "; cat /proc/$pid/patch_state 2>/dev/null || echo "N/A"; done
44
45 # Build a livepatch module from a source diff (RHEL/CentOS)
46
47 kpatch-build -s /usr/src/kernels/$(uname -r) cve-2024-xxxx.patch
48
49 # Verify Canonical Livepatch status on Ubuntu
50
51 canonical-livepatch status --verboseDebug Checklist
- 1
Verify ftrace is enabled: cat /proc/sys/kernel/ftrace_enabled - 2
Check livepatch kernel config: grep CONFIG_LIVEPATCH /boot/config-$(uname -r) - 3
List active livepatches: ls /sys/kernel/livepatch/ - 4
Check patch enabled state: cat /sys/kernel/livepatch/<patch>/enabled - 5
Check transition progress: cat /sys/kernel/livepatch/<patch>/transition - 6
Find tasks blocking transition: cat /proc/<pid>/patch_state (0=old, 1=new) - 7
Review livepatch kernel messages: dmesg | grep livepatch - 8
Verify kpatch service status: systemctl status kpatch
Key Takeaways
- ✓Livepatching replaces entire function bodies, not individual instructions. The granularity is one function at a time. If a CVE fix modifies three functions, the livepatch module contains three replacement functions. ftrace redirects each one independently.
- ✓The consistency model is what separates modern livepatching from naive function replacement. Without it, one task could call the old version of function A, then the new version of function B that depends on the new behavior of A. The per-task universe switching prevents this by ensuring each task sees either all-old or all-new functions, never a mix.
- ✓A livepatch module is a normal kernel module (.ko file) that calls klp_enable_patch() in its init function. It can be built, distributed, and loaded with standard module tools. The livepatch framework handles the ftrace registration and consistency transitions.
- ✓Livepatches are cumulative. A second patch must account for the first. If patch-1 replaces function foo() and patch-2 also modifies foo(), then patch-2 must contain the combined fix. Atomic replace mode (since kernel 5.1) simplifies this by allowing a single patch to replace all previous patches at once.
- ✓The compiler must generate functions with __fentry__ prologues for livepatching to work. This is controlled by CONFIG_FUNCTION_TRACER and the -pg or -mfentry compiler flag. Functions that are inlined, marked __always_inline, or compiled without fentry cannot be livepatched.
Common Pitfalls
- ✗Assuming any kernel bug can be livepatched. Data structure changes, new struct fields, modified function signatures, and changes to inline functions or assembly cannot be applied via livepatching. Roughly 60-70% of security fixes are livepatchable. The rest require a full reboot.
- ✗Leaving livepatches as permanent fixes. Livepatches are emergency bandages, not long-term solutions. They accumulate, interact in subtle ways, and make debugging harder because the running code no longer matches the installed kernel package. The correct workflow is: livepatch immediately, then schedule a real kernel update within the next maintenance window.
- ✗Ignoring the transition state. After loading a livepatch, the transition file in /sys/kernel/livepatch/ may stay at 1 for seconds or minutes if long-running kernel tasks have not reached a safe transition point. A patch is not fully active until transition reaches 0 and all tasks have switched to the new universe.
- ✗Stacking multiple independent livepatches without atomic replace. Each patch hooks the same ftrace entry points. If two patches modify different call sites in the same function, the interactions become unpredictable. Atomic replace (replace flag in klp_patch) was introduced in kernel 5.1 specifically to solve this by treating each new patch as a complete replacement of all previous patches.
Reference
In One Line
Livepatching swaps vulnerable kernel functions at runtime via ftrace redirection -- no reboot, no downtime, but only function-body changes qualify; data structure modifications still demand a restart.