Hybrid Logical Clocks (HLC)
Architecture
The Gap Between Physical and Logical Clocks
Physical clocks (wall clocks synchronized by NTP) give you real time but no causal guarantees. Event A at 10:00:00.001 might have happened before or after event B at 10:00:00.002 depending on clock skew between machines. NTP keeps clocks within a few milliseconds, but "within a few milliseconds" is not the same as "exactly synchronized."
Logical clocks (Lamport clocks, vector clocks) give you perfect causal ordering but no connection to real time. Lamport timestamp 42 tells you nothing about when the event actually happened. You cannot use it for TTLs, lease expiration, or anything that needs wall-clock semantics.
HLC, introduced by Kulkarni, Demirbas, Madeppa, Avva, and Leone in 2014, bridges this gap. An HLC timestamp has two components:
- pt (physical): tracks the maximum physical time seen
- l (logical): breaks ties when pt is the same
The resulting timestamp behaves like a logical clock (preserves causal ordering) while staying close to the physical clock (useful for time-based reasoning).
The HLC Algorithm
Each node maintains three values: pt (the physical component of its HLC), l (the logical component), and access to its local physical clock now().
Local Event or Send
When a node creates a local event or sends a message:
if now() > pt:
pt = now()
l = 0
else:
l = l + 1
timestamp = (pt, l)
If the physical clock has advanced, reset the logical counter and use the new physical time. If the physical clock has not advanced (same millisecond, or clock went backward), increment the logical counter.
Receive Message
When a node receives a message with timestamp (msg_pt, msg_l):
old_pt = pt
pt = max(now(), pt, msg_pt)
if pt == old_pt and pt == msg_pt:
l = max(l, msg_l) + 1
elif pt == old_pt:
l = l + 1
elif pt == msg_pt:
l = msg_l + 1
else:
l = 0
timestamp = (pt, l)
The logic ensures: (1) pt never goes backward, (2) the timestamp is always greater than or equal to any timestamp that causally precedes it, and (3) pt stays as close as possible to the actual physical time.
Why This Works
HLC preserves the happens-before relation: if event A causally precedes event B, then HLC(A) < HLC(B). The proof follows from the update rules: every event takes the max of all known timestamps and increments.
HLC timestamps also stay within epsilon of real time, where epsilon is the maximum clock skew across the system. Specifically, pt - now() <= epsilon at any point. This means HLC timestamps are usable for time-based operations (TTLs, lease expiration) with an error margin of the clock skew.
The combination is powerful: causal ordering for correctness, approximate physical time for usability.
CockroachDB: HLC in Production
CockroachDB is the most prominent production user of HLC and provides a concrete case study of how the theory translates to practice.
MVCC Timestamps
Every key-value pair in CockroachDB has an HLC timestamp. When a transaction writes a key, the write gets the transaction's commit timestamp. When reading, the system retrieves the version with the highest timestamp at or before the read timestamp.
HLC timestamps enable snapshot isolation and serializable isolation. A transaction reads a consistent snapshot defined by its timestamp. Writes are ordered by their HLC timestamps.
The Uncertainty Interval
Here is where bounded clock skew matters. When a transaction at timestamp T reads a key, it might find a value with a timestamp between T and T + max_offset (the configured maximum clock skew, default 500ms in CockroachDB). This value might have been written by a transaction that committed before T in real time but got a higher HLC timestamp due to clock skew.
CockroachDB handles this with an uncertainty interval. If a read encounters a value in the uncertainty window [T, T + max_offset], it cannot be sure whether the write truly happened before or after the read. In this case, CockroachDB restarts the transaction at a higher timestamp (above the uncertain value).
This is the practical cost of clock skew. Lower clock skew means smaller uncertainty intervals, fewer transaction restarts, and better performance. CockroachDB recommends NTP with tight synchronization and offers Google Cloud Spanner-like TrueTime integration for even tighter bounds.
Why Not Just TrueTime?
Google Spanner uses TrueTime, which provides explicit confidence intervals on the current time using atomic clocks and GPS receivers. TrueTime can say "the current time is between 10:00:00.001 and 10:00:00.003" with high confidence. This lets Spanner wait out the uncertainty (a "commit wait" of a few milliseconds) and guarantee external consistency.
CockroachDB uses HLC instead of TrueTime because TrueTime requires specialized hardware (atomic clocks) that is only available in Google's infrastructure. HLC works with commodity NTP, making CockroachDB deployable on any cloud or bare-metal setup. The trade-off: larger uncertainty intervals (500ms NTP vs. ~7ms TrueTime) and occasional transaction restarts.
YugabyteDB: Safe Time
YugabyteDB uses a variant of HLC for its "safe time" mechanism. Safe time is the timestamp up to which a node can serve consistent reads without needing to contact other nodes.
A node's safe time advances when it receives heartbeats from the leader with the leader's current HLC timestamp. Any read at a timestamp below safe time is guaranteed to see all committed writes up to that point.
The interplay between HLC and safe time gives YugabyteDB low-latency reads from followers without sacrificing consistency. The follower knows "I have all data up to safe time T" because the leader's HLC timestamp establishes a causal boundary.
HLC vs. Other Clock Approaches
vs. Lamport Clocks
Lamport clocks are simpler (a single integer counter) but have no physical time information. HLC adds physical time awareness with minimal extra complexity (one more counter). For systems that never need wall-clock semantics, Lamport clocks suffice. For databases with TTLs, lease management, or user-facing timestamps, HLC is worth the small additional cost.
vs. Vector Clocks
Vector clocks can detect concurrent events (neither causally ordered). HLC cannot, because it gives a total order (compare pt first, then l). Vector clocks grow with the number of nodes (O(n) space). HLC is constant size. For systems with hundreds or thousands of nodes, the space difference is significant.
For a replicated key-value store where you need to detect concurrent writes for conflict resolution, vector clocks (or dotted version vectors) are the right choice. For a distributed SQL database where you need timestamp-based MVCC, HLC is the right choice.
vs. TrueTime
TrueTime provides hard bounds on clock uncertainty using specialized hardware. HLC provides soft bounds using NTP. TrueTime enables commit-wait for external consistency (Spanner). HLC uses uncertainty intervals and transaction restarts (CockroachDB).
TrueTime is objectively better but requires Google-grade infrastructure. HLC is the pragmatic alternative for everyone else.
NTP Considerations
HLC's guarantees depend on NTP keeping clocks reasonably synchronized. In practice:
Public NTP (pool.ntp.org): clock skew typically 1-10ms, sometimes up to 100ms during network issues. This is fine for most applications but makes CockroachDB's uncertainty intervals large.
Cloud provider NTP (Amazon Time Sync, Google Cloud NTP): clock skew typically under 1ms. Much better for HLC-based systems. CockroachDB on AWS with Amazon Time Sync sees very few uncertainty restarts.
Chrony vs. ntpd: Chrony generally achieves tighter synchronization than traditional ntpd, especially after network disruptions. Most modern Linux distributions default to Chrony.
Clock jumps: NTP can step the clock forward or backward if the drift is too large. A backward jump is dangerous for HLC because pt might suddenly be far ahead of now(). HLC handles this gracefully (pt never decreases, logical counter absorbs the gap), but the logical counter grows until real time catches up with pt.
Monitoring clock skew in production (using metrics from Chrony or NTP) is essential for HLC-based systems. Alert on skew above your configured max_offset and investigate before it causes problems.
Key Points
- •HLC combines a physical timestamp with a logical counter to get the best of both worlds. Events get timestamps that are close to real wall-clock time (useful for humans and TTLs) while still maintaining the causal ordering guarantees of logical clocks
- •The physical component tracks the maximum physical time seen across all messages. The logical component breaks ties when multiple events have the same physical time. Together, they form a timestamp that is always within clock skew of the actual physical time
- •CockroachDB uses HLC timestamps for MVCC versioning and serializable isolation. Every transaction gets an HLC timestamp, and CockroachDB uses the bounded clock skew to define uncertainty intervals for reads
- •Unlike vector clocks, HLC timestamps have a fixed size (one physical counter + one logical counter) regardless of the number of nodes. This makes them practical for systems with thousands of nodes where vector clock metadata would be prohibitively large
- •The key assumption is bounded clock skew. NTP typically keeps clocks within a few milliseconds of each other. HLC uses this bound to reason about whether two events at nearby timestamps might be causally related or are definitely concurrent
Used By
Common Mistakes
- ✗Treating HLC timestamps as exact wall-clock times. The physical component is always >= real time but can be ahead of it (because it tracks the maximum physical time seen in messages). Using HLC timestamps for user-facing time display works most of the time but can show slightly future times
- ✗Ignoring the clock skew bound. HLC correctness depends on NTP keeping clocks reasonably synchronized. If NTP fails or clock skew exceeds the configured bound, the uncertainty intervals in systems like CockroachDB grow, increasing transaction retry rates
- ✗Assuming HLC gives you a total order like Google TrueTime. HLC tells you 'A happened before B' or 'A and B might be concurrent.' TrueTime tells you 'A definitely happened before B' with confidence intervals. TrueTime provides a total order; HLC provides a partial order with physical time hints
- ✗Not persisting the HLC state across restarts. If a node restarts with its HLC reset to current physical time, it might issue timestamps lower than timestamps it issued before the restart. This can break causal ordering. Persist the logical counter and last known physical time