HyperLogLog & Cardinality Estimation

The Problem: Counting Unique Things at Scale

You want to know how many unique users visited your site today. With 10,000 users, store them in a HashSet. With 100 million users across 50 services, that HashSet eats gigabytes of RAM per counter. And you have hundreds of counters: unique users per page, per campaign, per country, per hour.

Exact counting at this scale is a memory problem disguised as a counting problem. HyperLogLog solves it by trading a small amount of accuracy for a massive reduction in memory. About 12KB per counter, regardless of whether you are counting thousands or billions of distinct elements.

How It Actually Works

Take any element, hash it to a uniformly distributed binary string. Now look at the leading zeros. If the hash starts with 0001..., that is a run of three zeros before the first 1. The key insight: the longer the run of leading zeros you have observed, the more distinct elements you have probably seen.

Think of it like coin flips. Getting one head in a row is common. Getting ten heads in a row means you have been flipping for a while. If someone tells you the longest streak of heads they observed was 20 in a row, you can estimate they flipped the coin roughly 2^20 times (about a million).

HyperLogLog improves on this raw idea in three ways.

Stochastic averaging. Instead of keeping one maximum, split elements into m buckets (called registers) using the first p bits of the hash. Each register independently tracks its own longest zero-run from the remaining bits. This reduces variance dramatically. With m = 16384 registers (p = 14 bits), you get 16384 independent experiments instead of one.

Harmonic mean correction. Averaging the register values directly would give you a biased estimate. HyperLogLog uses the harmonic mean instead, which handles outliers much better. The formula is: E = alpha_m * m^2 / sum(2^(-M[j])) where M[j] is the value in register j and alpha_m is a bias correction constant that depends on m.

Small range and large range corrections. When the estimate is very low (below 5/2 * m), some registers might still be at zero, which means the harmonic mean estimate is too high. HyperLogLog switches to a linear counting method for this range. At the high end (above 2^32 / 30 for 32-bit hashes), hash collisions start to matter, and a large range correction kicks in. The HyperLogLog++ paper from Google mostly eliminates the need for the large range correction by using 64-bit hashes.

Memory and Accuracy

Each register stores a single number: the longest zero-run seen. With 64-bit hashes, this number can range from 0 to about 64, so each register needs 6 bits. With m = 16384 registers, that is 16384 * 6 / 8 = 12288 bytes. About 12KB.

The standard error formula is 1.04 / sqrt(m):

Registers (m)	Bits (p)	Memory	Standard Error
256	8	192 bytes	6.50%
1024	10	768 bytes	3.25%
4096	12	3 KB	1.63%
16384	14	12 KB	0.81%
65536	16	48 KB	0.41%

Redis chose 16384 registers as the default for PFADD/PFCOUNT. That gives you 0.81% error in 12KB. Most analytics workloads do not need better than 1% accuracy for unique counts, so this lands in a sweet spot.

The Merge Property

This is where HyperLogLog becomes genuinely powerful in distributed systems. Merging two HLL sketches is trivial: take the maximum of each corresponding register. If sketch A has register[5] = 7 and sketch B has register[5] = 9, the merged sketch has register[5] = 9.

This means you can:

Run HLL on each partition of a sharded database, then merge to get the global count
Maintain per-hour HLL sketches and merge them to get daily or weekly unique counts
Compute the intersection of two HLL sketches (via inclusion-exclusion: |A union B| = |A| + |B| - |A intersect B|), though intersection estimates get noisy when the sets overlap significantly

Compare this to exact counting: to merge two exact sets, you need to transmit every element and deduplicate. With HLL, you transmit 12KB and take element-wise max.

Sparse vs Dense Representation

When an HLL sketch is first created, most registers are at zero. Storing all 16384 registers wastes memory. Redis and Google's HLL++ use a sparse representation for low-cardinality cases: they store only the non-zero registers as a sorted list of (index, value) pairs.

Once enough registers fill up (typically when the list exceeds ~40% of the dense size), the implementation switches to the dense 12KB array. This optimization makes HLL viable even when you have millions of counters, most of which track small cardinalities.

When to Use HLL

Use it when you need to count distinct elements at scale and can tolerate about 1% error. Unique visitors, unique IP addresses, unique search queries, unique events. Anything where "approximately 14.2 million" is as useful as "exactly 14,198,347."

Do not use it when you need exact counts (billing, compliance reporting), when you need to know which elements are in the set (use a Bloom filter or an actual set), or when your cardinality is small enough that exact counting is cheap (under a few million, just use a set).

Do not use it for intersection estimation when the sets have very different sizes or low overlap. The inclusion-exclusion formula amplifies errors. If set A has 100 million elements and set B has 200 million, but they only share 1000, the intersection estimate will be dominated by noise. For intersection, consider MinHash or purpose-built sketch structures.

Variants Worth Knowing

HyperLogLog++ (Google, 2013) improved the original with 64-bit hashes (eliminating the large range correction), a more accurate bias correction for medium cardinalities using empirical data rather than a formula, and the sparse representation. This is what most modern implementations use.

Streaming HLL implementations in Flink, Kafka Streams, and Spark process elements in a single pass with constant memory. The sketch updates are associative and commutative, which means they parallelize perfectly across stream partitions.

LogLog-Beta (2016) is a simpler variant that achieves similar accuracy to HLL++ without the piecewise bias correction. It replaces the harmonic mean with a different estimator that naturally handles small and large ranges. Not widely adopted yet, but worth tracking.

The Problem: Counting Unique Things at Scale

How It Actually Works

HyperLogLog improves on this raw idea in three ways.

Memory and Accuracy

The standard error formula is 1.04 / sqrt(m):

Registers (m)	Bits (p)	Memory	Standard Error
256	8	192 bytes	6.50%
1024	10	768 bytes	3.25%
4096	12	3 KB	1.63%
16384	14	12 KB	0.81%
65536	16	48 KB	0.41%

The Merge Property

This means you can:

Run HLL on each partition of a sharded database, then merge to get the global count
Maintain per-hour HLL sketches and merge them to get daily or weekly unique counts
Compute the intersection of two HLL sketches (via inclusion-exclusion: |A union B| = |A| + |B| - |A intersect B|), though intersection estimates get noisy when the sets overlap significantly

Compare this to exact counting: to merge two exact sets, you need to transmit every element and deduplicate. With HLL, you transmit 12KB and take element-wise max.

Architecture

The Problem: Counting Unique Things at Scale

How It Actually Works

Memory and Accuracy

The Merge Property

Sparse vs Dense Representation

When to Use HLL

Variants Worth Knowing

Key Points

Used By

Common Mistakes

Related

HyperLogLog & Cardinality Estimation

Architecture

The Problem: Counting Unique Things at Scale

How It Actually Works

Memory and Accuracy

The Merge Property

Sparse vs Dense Representation

When to Use HLL

Variants Worth Knowing

Key Points

Used By

Common Mistakes

Related