Kernel & BootTopic 6 of 13

Kernel InternalsIntermediate

Timers, Clocks & High-Resolution Timers

systemdGoJVMNode.js

🧠

Mental Model

Three clocks on the same wall. The first syncs with the radio station every hour and occasionally jumps backward when it corrects itself -- great for knowing what time it is right now, unreliable for timing a recipe. The second started when the house was built and never adjusts. Perfect for measuring durations, but it pauses during power outages. The third also started at construction but keeps counting through outages -- the only way to know true total elapsed time. Pick the wrong clock and dinner burns or the alarm never fires.

💡

The Problem

NTP corrects the clock backward by 5 seconds at 2 AM, and 47 services using CLOCK_REALTIME for their watchdog deadlines simultaneously think they are overdue. Cascade restart. Meanwhile, a 1ms setTimeout fires at 4-6ms on a 250Hz kernel, and the trading logic behind it loses $12,000 per missed millisecond. On a VM after live migration, System.nanoTime() goes backward, producing negative latency measurements that corrupt percentile histograms built from 5 million calls per second. And a process that creates 10,000 POSIX timers exhausts kernel memory for signal slots -- timer_create starts returning EAGAIN.

Architecture

A 10-second timeout is set. NTP adjusts the clock backward by 5 seconds. The timeout takes 15 seconds.

This is not a hypothetical. It happens in production. And the fix is embarrassingly simple: the wrong clock was in use.

Linux has multiple clocks for a reason. Pick the wrong one and timeouts break. Pick the right one and they work even through NTP adjustments, CPU frequency changes, and laptop suspends.

What Actually Happens

At boot, the kernel's clocksource framework evaluates available hardware timers -- TSC (per-CPU, ~1ns resolution), HPET (memory-mapped, ~100ns), ACPI PM Timer (legacy fallback) -- and selects the best one. The timekeeping subsystem converts raw counter readings into nanosecond timestamps, maintaining separate epochs for each clock type.

CLOCK_REALTIME tracks wall clock time. It can be adjusted by NTP, settimeofday, or adjtimex. It can jump backward. Use it for log timestamps and calendar scheduling.

CLOCK_MONOTONIC starts at boot and only moves forward. NTP can slew it (gradually speed up or slow down), but it never jumps. Use it for timeouts, elapsed time, and performance measurement.

CLOCK_BOOTTIME is CLOCK_MONOTONIC plus time spent in system suspend. A 10-minute CLOCK_MONOTONIC timer will not fire if the laptop sleeps for 2 hours -- CLOCK_BOOTTIME will fire immediately on wake.

The kernel offers two timer mechanisms. The timer wheel (struct timer_list) is a hierarchical timing wheel with O(1) insertion, optimized for the common case where most timeouts are cancelled before they fire (TCP retransmits, device polling). It runs at jiffies granularity (typically 4ms). hrtimers use a per-CPU red-black tree sorted by expiry time, backed by the local APIC timer. They provide nanosecond resolution and power nanosleep(), POSIX timers, and the scheduler tick.

For event-driven servers, timerfd turns timers into file descriptors. timerfd_create() makes an fd. timerfd_settime() arms it. The fd becomes readable when the timer expires -- integrating naturally with epoll, select, or poll. No signal handlers. No async-signal-safe constraints. Nginx, libuv, and systemd all use timerfd.

Under the Hood

The vDSO makes clock_gettime essentially free. The kernel maps a read-only vvar page into every process, containing current time values and TSC calibration parameters. When clock_gettime(CLOCK_MONOTONIC) is called, the vDSO reads the vvar page, reads the TSC (~25 cycles), and computes the time. All in user space. ~20ns instead of ~200ns for a real syscall.

Timer slack saves power. The kernel rounds timer expiries to align with other pending timers, reducing CPU wakeups. Default slack: 50 microseconds (settable via prctl(PR_SET_TIMERSLACK)). This is why setTimeout(1ms) in an application rarely fires at exactly 1ms. Real-time tasks set slack to 0.

Tickless operation. By default, the kernel fires a tick at CONFIG_HZ frequency (250 Hz = every 4ms). CONFIG_NO_HZ_IDLE stops the tick when the CPU is idle (saves power). CONFIG_NO_HZ_FULL stops it even when one task is running -- eliminating jitter for latency-sensitive workloads (HFT, real-time audio).

Common Questions

Why does Linux have both CLOCK_MONOTONIC and CLOCK_BOOTTIME?

CLOCK_MONOTONIC stops counting during suspend. Setting a 10-minute timer before the laptop sleeps for 2 hours means the timer fires 10 minutes after wake -- 2 hours and 10 minutes after it was set. CLOCK_BOOTTIME fires immediately on wake because the 10 minutes elapsed during sleep. Android's AlarmManager uses CLOCK_BOOTTIME for wake-up alarms.

How does timerfd improve on signal-based timers?

Signal-based timers (timer_create with SIGEV_SIGNAL) deliver via signals, which have pitfalls: handlers must be async-signal-safe (no malloc, no printf), signals can be lost, and signal handling interacts poorly with threads. timerfd makes timers into file descriptors that integrate with the same epoll loop as sockets. No signal handlers. No races.

What causes clock_gettime to return non-monotonic values?

On older CPUs without constant_tsc, the TSC varies with CPU frequency. If a thread reads TSC on one core, migrates, and reads on another, the second reading can be lower. Modern CPUs with constant_tsc and nonstop_tsc guarantee synchronization. In VMs, the hypervisor must offset vCPU TSCs -- a known source of timekeeping issues during live migration.

What is resolution vs precision?

Resolution is the smallest increment the clock can represent (clock_getres returns 1ns). Precision is the actual accuracy -- it depends on TSC frequency, interrupt latency, and timer slack. A clock with 1ns resolution and 1us precision means readings are accurate to ~1us despite representing 1ns granularity.

How Technologies Use This

systemd

At 2 AM, NTP corrects the system clock backward by 10 seconds. Dozens of services with WatchdogSec suddenly think they have missed their deadlines, and systemd restarts them all simultaneously. The cascade of false watchdog kills takes down half the services on the machine.

The cause is using wall-clock time (CLOCK_REALTIME) for timeout calculations. When NTP adjusts the clock backward, every active deadline computed from wall-clock time shifts into the future or past. A 30-second watchdog timeout becomes a 40-second wait or appears to have already expired, depending on the direction of the adjustment.

systemd uses CLOCK_MONOTONIC for all internal deadlines including WatchdogSec, RestartSec, and OnBootSec, which is immune to NTP adjustments. Only OnCalendar= scheduling uses CLOCK_REALTIME, because calendar dates genuinely require wall-clock correlation. This design choice prevents an estimated 5-10 spurious service restarts per NTP sync event on a typical server.

A 10-second HTTP client timeout in a Go service occasionally takes 15 seconds to fire. The issue is intermittent, happening only at specific times of day, and there is no network delay or server-side slowdown to explain it.

The root cause is that if the runtime used CLOCK_REALTIME, an NTP backward adjustment of 5 seconds mid-wait silently extends every active timeout by 5 seconds. The timeout was counting wall-clock seconds, not elapsed seconds, so a clock correction during the wait period stretches the deadline without any notification.

Go avoids this entirely by using CLOCK_MONOTONIC for time.After, time.Ticker, and scheduler preemption, all read via the vDSO at roughly 20ns per call without any syscall. Go deliberately never uses CLOCK_REALTIME for timeouts, deadlines, or internal scheduling. In a microservice making 10K requests per second, this saves about 200K unnecessary kernel transitions per second compared to a real syscall approach.

JVM

Lock.tryLock returns immediately without waiting, Thread.sleep wakes early, and latency histograms report impossible sub-zero durations. The application behavior is non-deterministic and only occurs during specific time windows.

The cause is that System.nanoTime() went backward between two calls in the same thread. A backward clock jump makes tryLock compute a negative remaining wait time, causes sleep to think the target time has already passed, and produces negative latency measurements when the end timestamp is smaller than the start timestamp.

The JVM prevents this by mapping System.nanoTime() to CLOCK_MONOTONIC via the vDSO, which the JIT compiler inlines to roughly 25ns per call. System.currentTimeMillis() uses CLOCK_REALTIME only for log timestamps where wall-clock correlation matters. On a server calling nanoTime() 5 million times per second for metrics collection, the vDSO path avoids 1 billion unnecessary kernel transitions per second.

Node.js

setTimeout(1, callback) fires after 4-6 milliseconds instead of 1, and timing-sensitive operations in Node.js are consistently late. Developers assume the event loop is overloaded, but even an idle Node.js process shows the same delay.

The kernel's timer slack intentionally rounds expiry times to align with other pending timers, reducing CPU wakeups at the cost of precision. On a default 250Hz kernel, the minimum effective granularity is 4ms regardless of the value passed to setTimeout. This is not a Node.js bug -- it is a deliberate kernel power-saving optimization.

libuv works around this by creating timerfd file descriptors armed with CLOCK_MONOTONIC and integrating them into the epoll event loop alongside socket I/O. process.hrtime.bigint() provides true nanosecond precision via the vDSO at roughly 20ns per call with zero syscall cost, making it 10x more precise than Date.now() for benchmarking hot code paths.

Same Concept Across Tech

Concept	Docker	JVM	Node.js	Go	K8s
Monotonic clock	Container shares host clocksource	System.nanoTime() maps to CLOCK_MONOTONIC via vDSO	process.hrtime.bigint() uses CLOCK_MONOTONIC	time.Now() uses CLOCK_MONOTONIC for monotonic component	Lease durations use monotonic clock
Wall clock	Container inherits host CLOCK_REALTIME	System.currentTimeMillis() uses CLOCK_REALTIME	Date.now() uses CLOCK_REALTIME	time.Now().Unix() uses CLOCK_REALTIME	Certificate expiry checks use wall clock
Timer precision	Limited by host CONFIG_HZ	ScheduledExecutorService wraps hrtimers	libuv timerfd + epoll; setTimeout min ~4ms	time.After uses runtime timers (~1ms min)	Health check probes have 1s minimum granularity
NTP vulnerability	All containers affected by host NTP	Only System.currentTimeMillis() affected	Only Date.now() affected	Only time.Now().UTC() affected	Pod clocks drift if NTP not configured on nodes

Stack Layer Mapping

Layer	Component
Hardware	TSC (x86, ~1ns), HPET (~100ns), ACPI PM Timer (legacy)
Kernel clocksource	clocksource framework selects best hardware source
Kernel timers	timer_list wheel (jiffies, coarse) and hrtimer tree (ns, precise)
vDSO	vvar page + TSC read = ~20ns clock_gettime without syscall
Userspace	timerfd for event loops, POSIX timers for signal-based, clock_nanosleep for precise wakeup

Design Rationale: Wall time and elapsed time are fundamentally different measurements -- one answers "what time is it?" and the other answers "how long has it been?" -- and conflating them causes real failures when NTP adjusts the clock. Separate clock IDs keep those concerns apart. The vDSO exists because reading the time is the single most frequent kernel interaction in most applications, and paying 200ns per call for a ring transition is absurd when a shared memory page and a TSC read can do it in 20ns. Timer slack rounds expiries to batch wakeups because waking the CPU once for five timers uses far less power than waking it five times -- the right tradeoff for everything except real-time workloads.

If You See This, Think This

Symptom	Likely Cause	First Check
Timeouts fire 5-10 seconds late after NTP sync	Using CLOCK_REALTIME for deadline calculation	Audit code for CLOCK_REALTIME in timeout paths
setTimeout(1ms) fires at 4-6ms	Kernel CONFIG_HZ=250 and timer slack rounding	`grep CONFIG_HZ /boot/config-$(uname -r)`
Negative elapsed time measurements	TSC not synchronized across cores (VM or old CPU)	`grep constant_tsc /proc/cpuinfo`
Timer on laptop fires 2 hours late after resume	Using CLOCK_MONOTONIC instead of CLOCK_BOOTTIME	Switch to CLOCK_BOOTTIME for suspend-aware timers
Thousands of timer_create calls fail with EAGAIN	Too many POSIX timers per process	Replace with single timerfd + userspace wheel
clock_gettime takes 200ns instead of 20ns	vDSO not available or clocksource fell back to HPET	`cat /sys/devices/system/clocksource/clocksource0/current_clocksource`

When to Use / Avoid

Use CLOCK_MONOTONIC for all timeouts, deadlines, elapsed time measurements, and performance benchmarks
Use CLOCK_REALTIME only for log timestamps, calendar scheduling, and human-readable wall clock display
Use CLOCK_BOOTTIME for mobile wake-up alarms and any timer that must survive system suspend
Use timerfd when timers need to integrate with epoll-based event loops (replaces signal-based POSIX timers)
Avoid CLOCK_REALTIME for any duration calculation -- NTP adjustments will corrupt results
Avoid creating thousands of POSIX timers per process -- use a single timerfd or userspace timing wheel

Try It Yourself

 1  # Show the current clock source in use
 2  
 3  cat /sys/devices/system/clocksource/clocksource0/current_clocksource 2>/dev/null && cat /sys/devices/system/clocksource/clocksource0/available_clocksource 2>/dev/null
 4  
 5  # Show kernel timer frequency (CONFIG_HZ)
 6  
 7  grep CONFIG_HZ /boot/config-$(uname -r) 2>/dev/null || echo 'Config not available'
 8  
 9  # View all active timers in the kernel
10  
11  sudo cat /proc/timer_list 2>/dev/null | head -40
12  
13  # Measure clock_gettime resolution
14  
15  python3 -c 'import time; times = [time.clock_gettime(time.CLOCK_MONOTONIC) for _ in range(10)]; diffs = [times[i]-times[i-1] for i in range(1,10)]; print(f"Min delta: {min(diffs)*1e9: 0f}ns, Mean: {sum(diffs)/len(diffs)*1e9: 0f}ns")' 2>/dev/null || echo 'python3 not available'
16  
17  # Check TSC reliability flags
18  
19  grep -oE '(constant_tsc|nonstop_tsc|tsc_reliable|tsc_known_freq)' /proc/cpuinfo 2>/dev/null | sort -u || echo 'Not x86 or no TSC flags'
20  
21  # Show timer slack for current process
22  
23  cat /proc/$$/timerslack_ns 2>/dev/null || echo 'timerslack_ns not available'

Debug Checklist

1cat /sys/devices/system/clocksource/clocksource0/current_clocksource
2grep -oE '(constant_tsc|nonstop_tsc)' /proc/cpuinfo | sort -u
3cat /proc/timer_list | head -60
4grep CONFIG_HZ /boot/config-$(uname -r) 2>/dev/null
5cat /proc/$$/timerslack_ns
6perf stat -e 'timer:*' -a -- sleep 1

Key Takeaways

✓CLOCK_MONOTONIC never goes backward -- not during NTP adjustments, not during manual time changes. Use it for elapsed time and timeouts. CLOCK_REALTIME tracks wall clock time and CAN jump backward. Using it for timeouts causes hangs or early wakes.
✓CLOCK_BOOTTIME includes time spent in suspend. A 10-minute CLOCK_MONOTONIC timer will not fire if the laptop sleeps for an hour. CLOCK_BOOTTIME will fire immediately on wake because the 10 minutes elapsed during sleep.
✓The kernel tick runs at CONFIG_HZ (typically 250 Hz = 4ms). With CONFIG_NO_HZ_FULL, the tick stops entirely when one task is running -- reducing jitter for latency-sensitive workloads at the cost of slightly higher overhead when the tick fires.
✓clock_gettime via the vDSO reads a shared page and the TSC -- no syscall, ~20ns. Most benchmarks measuring 'syscall overhead' with clock_gettime are actually measuring vDSO speed, not syscall cost.
✓Timer slack rounds timer expiries to align with other timers, reducing CPU wakeups. Default is 50us for non-RT tasks. That is why setTimeout(1ms) rarely fires at 1ms. prctl(PR_SET_TIMERSLACK, 0) disables it for real-time tasks.

Common Pitfalls

✗Mistake: Using CLOCK_REALTIME for timeout calculations. Reality: NTP can adjust the clock backward, turning a 10-second timeout into a 15-second wait. Always use CLOCK_MONOTONIC for deadlines and elapsed time.
✗Mistake: Expecting nanosleep() to wake at exactly the requested time. Reality: The kernel rounds to timer resolution and adds slack. On a 250 HZ kernel, nanosleep(1ms) typically sleeps for 4ms. Use clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME) for precise wakeups.
✗Mistake: Creating thousands of POSIX timers per process. Reality: Each consumes kernel memory and a signal slot. Use a single timerfd with the nearest expiry, or a userspace timing wheel.
✗Mistake: Assuming TSC is synchronized across CPU cores. Reality: Older or misconfigured systems (NUMA, VMs without TSC offsetting) have per-core TSC drift. Check for constant_tsc and nonstop_tsc CPU flags. Modern CPUs are safe.

Reference

System Calls

clock_gettimetimer_createtimer_settimetimerfd_createnanosleep

Tools

clock_getres()/proc/timer_listperf sched latency

📌

In One Line

MONOTONIC for timeouts, REALTIME for display, BOOTTIME for mobile alarms -- picking the wrong clock corrupts deadlines silently and the bug only shows up when NTP corrects.

Timers, Clocks & High-Resolution Timers

systemdGoJVMNode.js

🧠

Mental Model

💡

The Problem

Architecture

A 10-second timeout is set. NTP adjusts the clock backward by 5 seconds. The timeout takes 15 seconds.

This is not a hypothetical. It happens in production. And the fix is embarrassingly simple: the wrong clock was in use.

Linux has multiple clocks for a reason. Pick the wrong one and timeouts break. Pick the right one and they work even through NTP adjustments, CPU frequency changes, and laptop suspends.

What Actually Happens

CLOCK_REALTIME tracks wall clock time. It can be adjusted by NTP, settimeofday, or adjtimex. It can jump backward. Use it for log timestamps and calendar scheduling.

CLOCK_MONOTONIC starts at boot and only moves forward. NTP can slew it (gradually speed up or slow down), but it never jumps. Use it for timeouts, elapsed time, and performance measurement.

Under the Hood

Common Questions

Why does Linux have both CLOCK_MONOTONIC and CLOCK_BOOTTIME?

How does timerfd improve on signal-based timers?

What causes clock_gettime to return non-monotonic values?

What is resolution vs precision?

How Technologies Use This

systemd

JVM

Node.js

Same Concept Across Tech

Concept	Docker	JVM	Node.js	Go	K8s
Monotonic clock	Container shares host clocksource	System.nanoTime() maps to CLOCK_MONOTONIC via vDSO	process.hrtime.bigint() uses CLOCK_MONOTONIC	time.Now() uses CLOCK_MONOTONIC for monotonic component	Lease durations use monotonic clock
Wall clock	Container inherits host CLOCK_REALTIME	System.currentTimeMillis() uses CLOCK_REALTIME	Date.now() uses CLOCK_REALTIME	time.Now().Unix() uses CLOCK_REALTIME	Certificate expiry checks use wall clock
Timer precision	Limited by host CONFIG_HZ	ScheduledExecutorService wraps hrtimers	libuv timerfd + epoll; setTimeout min ~4ms	time.After uses runtime timers (~1ms min)	Health check probes have 1s minimum granularity
NTP vulnerability	All containers affected by host NTP	Only System.currentTimeMillis() affected	Only Date.now() affected	Only time.Now().UTC() affected	Pod clocks drift if NTP not configured on nodes

Stack Layer Mapping

Layer	Component
Hardware	TSC (x86, ~1ns), HPET (~100ns), ACPI PM Timer (legacy)
Kernel clocksource	clocksource framework selects best hardware source
Kernel timers	timer_list wheel (jiffies, coarse) and hrtimer tree (ns, precise)
vDSO	vvar page + TSC read = ~20ns clock_gettime without syscall
Userspace	timerfd for event loops, POSIX timers for signal-based, clock_nanosleep for precise wakeup

If You See This, Think This

Symptom	Likely Cause	First Check
Timeouts fire 5-10 seconds late after NTP sync	Using CLOCK_REALTIME for deadline calculation	Audit code for CLOCK_REALTIME in timeout paths
setTimeout(1ms) fires at 4-6ms	Kernel CONFIG_HZ=250 and timer slack rounding	`grep CONFIG_HZ /boot/config-$(uname -r)`
Negative elapsed time measurements	TSC not synchronized across cores (VM or old CPU)	`grep constant_tsc /proc/cpuinfo`
Timer on laptop fires 2 hours late after resume	Using CLOCK_MONOTONIC instead of CLOCK_BOOTTIME	Switch to CLOCK_BOOTTIME for suspend-aware timers
Thousands of timer_create calls fail with EAGAIN	Too many POSIX timers per process	Replace with single timerfd + userspace wheel
clock_gettime takes 200ns instead of 20ns	vDSO not available or clocksource fell back to HPET	`cat /sys/devices/system/clocksource/clocksource0/current_clocksource`

When to Use / Avoid

Use CLOCK_MONOTONIC for all timeouts, deadlines, elapsed time measurements, and performance benchmarks
Use CLOCK_REALTIME only for log timestamps, calendar scheduling, and human-readable wall clock display
Use CLOCK_BOOTTIME for mobile wake-up alarms and any timer that must survive system suspend
Use timerfd when timers need to integrate with epoll-based event loops (replaces signal-based POSIX timers)
Avoid CLOCK_REALTIME for any duration calculation -- NTP adjustments will corrupt results
Avoid creating thousands of POSIX timers per process -- use a single timerfd or userspace timing wheel

Try It Yourself

 1  # Show the current clock source in use
 2  
 3  cat /sys/devices/system/clocksource/clocksource0/current_clocksource 2>/dev/null && cat /sys/devices/system/clocksource/clocksource0/available_clocksource 2>/dev/null
 4  
 5  # Show kernel timer frequency (CONFIG_HZ)
 6  
 7  grep CONFIG_HZ /boot/config-$(uname -r) 2>/dev/null || echo 'Config not available'
 8  
 9  # View all active timers in the kernel
10  
11  sudo cat /proc/timer_list 2>/dev/null | head -40
12  
13  # Measure clock_gettime resolution
14  
15  python3 -c 'import time; times = [time.clock_gettime(time.CLOCK_MONOTONIC) for _ in range(10)]; diffs = [times[i]-times[i-1] for i in range(1,10)]; print(f"Min delta: {min(diffs)*1e9: 0f}ns, Mean: {sum(diffs)/len(diffs)*1e9: 0f}ns")' 2>/dev/null || echo 'python3 not available'
16  
17  # Check TSC reliability flags
18  
19  grep -oE '(constant_tsc|nonstop_tsc|tsc_reliable|tsc_known_freq)' /proc/cpuinfo 2>/dev/null | sort -u || echo 'Not x86 or no TSC flags'
20  
21  # Show timer slack for current process
22  
23  cat /proc/$$/timerslack_ns 2>/dev/null || echo 'timerslack_ns not available'

Debug Checklist

1cat /sys/devices/system/clocksource/clocksource0/current_clocksource
2grep -oE '(constant_tsc|nonstop_tsc)' /proc/cpuinfo | sort -u
3cat /proc/timer_list | head -60
4grep CONFIG_HZ /boot/config-$(uname -r) 2>/dev/null
5cat /proc/$$/timerslack_ns
6perf stat -e 'timer:*' -a -- sleep 1

Key Takeaways

✓CLOCK_MONOTONIC never goes backward -- not during NTP adjustments, not during manual time changes. Use it for elapsed time and timeouts. CLOCK_REALTIME tracks wall clock time and CAN jump backward. Using it for timeouts causes hangs or early wakes.
✓CLOCK_BOOTTIME includes time spent in suspend. A 10-minute CLOCK_MONOTONIC timer will not fire if the laptop sleeps for an hour. CLOCK_BOOTTIME will fire immediately on wake because the 10 minutes elapsed during sleep.
✓The kernel tick runs at CONFIG_HZ (typically 250 Hz = 4ms). With CONFIG_NO_HZ_FULL, the tick stops entirely when one task is running -- reducing jitter for latency-sensitive workloads at the cost of slightly higher overhead when the tick fires.
✓clock_gettime via the vDSO reads a shared page and the TSC -- no syscall, ~20ns. Most benchmarks measuring 'syscall overhead' with clock_gettime are actually measuring vDSO speed, not syscall cost.
✓Timer slack rounds timer expiries to align with other timers, reducing CPU wakeups. Default is 50us for non-RT tasks. That is why setTimeout(1ms) rarely fires at 1ms. prctl(PR_SET_TIMERSLACK, 0) disables it for real-time tasks.

Common Pitfalls

✗Mistake: Using CLOCK_REALTIME for timeout calculations. Reality: NTP can adjust the clock backward, turning a 10-second timeout into a 15-second wait. Always use CLOCK_MONOTONIC for deadlines and elapsed time.
✗Mistake: Expecting nanosleep() to wake at exactly the requested time. Reality: The kernel rounds to timer resolution and adds slack. On a 250 HZ kernel, nanosleep(1ms) typically sleeps for 4ms. Use clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME) for precise wakeups.
✗Mistake: Creating thousands of POSIX timers per process. Reality: Each consumes kernel memory and a signal slot. Use a single timerfd with the nearest expiry, or a userspace timing wheel.
✗Mistake: Assuming TSC is synchronized across CPU cores. Reality: Older or misconfigured systems (NUMA, VMs without TSC offsetting) have per-core TSC drift. Check for constant_tsc and nonstop_tsc CPU flags. Modern CPUs are safe.

Reference

System Calls

clock_gettimetimer_createtimer_settimetimerfd_createnanosleep

Tools

clock_getres()/proc/timer_listperf sched latency

📌

In One Line

MONOTONIC for timeouts, REALTIME for display, BOOTTIME for mobile alarms -- picking the wrong clock corrupts deadlines silently and the bug only shows up when NTP corrects.

Timers, Clocks & High-Resolution Timers

Mental Model

The Problem

Architecture

What Actually Happens

Under the Hood

Common Questions

How Technologies Use This

Same Concept Across Tech

If You See This, Think This

When to Use / Avoid

Try It Yourself

Debug Checklist

Key Takeaways

Common Pitfalls

Reference

In One Line

Related Topics

Timers, Clocks & High-Resolution Timers

Mental Model

The Problem

Architecture

What Actually Happens

Under the Hood

Common Questions

How Technologies Use This

Same Concept Across Tech

If You See This, Think This

When to Use / Avoid

Try It Yourself

Debug Checklist

Key Takeaways

Common Pitfalls

Reference

In One Line

Related Topics