Gorilla Compression (Delta-of-Delta + XOR)

The Problem: 16 Bytes Per Sample Adds Up Fast

A time-series sample is a timestamp-value pair: an int64 timestamp (8 bytes) and a float64 value (8 bytes). That is 16 bytes per sample, uncompressed.

For a monitoring system scraping 1 million time series every 15 seconds, that is 67,000 samples per second, or 5.76 billion samples per day. At 16 bytes each: 92 GB/day for a medium-sized deployment. At hyperscale (500M samples/sec): 43.2 trillion samples/day = 691 TB/day. No storage budget survives those numbers.

Generic compression algorithms help. LZ4 achieves 3-4x on time-series data. ZSTD gets 5-7x. But Gorilla achieves 12x, because it exploits two properties of time-series data that general-purpose compressors cannot see.

Timestamp Compression: Delta-of-Delta

The Insight

Metric scraping happens at fixed intervals. If you scrape every 15 seconds, consecutive timestamps differ by exactly 15,000 milliseconds. The delta is constant. And the delta of the delta is zero.

Worked Example

Consider six consecutive timestamps from a 15-second scrape:

Raw timestamps (unix ms):
  T0 = 1709827200000
  T1 = 1709827215000
  T2 = 1709827230000
  T3 = 1709827245000
  T4 = 1709827260000
  T5 = 1709827275000

Deltas (T[n] - T[n-1]):
  D1 = 15000
  D2 = 15000
  D3 = 15000
  D4 = 15000
  D5 = 15000

Delta-of-deltas (D[n] - D[n-1]):
  DD2 = 0
  DD3 = 0
  DD4 = 0
  DD5 = 0

Every delta-of-delta is zero. Gorilla uses variable-length bit encoding for these values:

DD = 0                    →  1 bit   (just a '0' flag)
DD in [-63, 64]           →  9 bits  (2-bit header + 7-bit value)
DD in [-255, 256]         → 12 bits  (3-bit header + 9-bit value)
DD in [-2047, 2048]       → 16 bits  (4-bit header + 12-bit value)
Otherwise                 → 36 bits  (4-bit header + 32-bit value)

With regular intervals, every timestamp after the second costs 1 bit. The first timestamp is stored in full (64 bits), the second stores the delta (14 bits), and every subsequent timestamp is a single bit.

What Happens When a Scrape Is Late?

If the fourth scrape arrives 1 second late (at T3 = 1709827246000 instead of 1709827245000):

Deltas:
  D3 = 16000  (instead of 15000)
  D4 = 14000  (next one catches up)

Delta-of-deltas:
  DD3 = 16000 - 15000 = 1000   → 16 bits (fits in [-2047, 2048])
  DD4 = 14000 - 16000 = -2000  → 16 bits

One late scrape costs 32 extra bits instead of 2. This is why consistent scrape intervals matter: they are not just an operational nicety, they directly control compression efficiency.

Value Compression: XOR Encoding

The Insight

Consecutive metric values tend to be similar. CPU utilization might be 47.2%, 47.3%, 47.3%, 47.2%. A counter that has not incremented has the exact same value. When you XOR two identical IEEE 754 float64 representations, you get all zeros. When you XOR two similar values, most bits are the same and the XOR result has long runs of zeros.

Worked Example

Consider five consecutive CPU utilization readings:

Values:  47.2,  47.3,  47.3,  47.2,  48.1

IEEE 754 hex representation:
  V0 = 0x4047999999999999  (47.2 — stored in full, 64 bits)
  V1 = 0x4047A66666666666  (47.3)
  V2 = 0x4047A66666666666  (47.3)
  V3 = 0x4047999999999999  (47.2)
  V4 = 0x404819999999999A  (48.1)

XOR with previous:
  V1 ⊕ V0 = 0x00003FFFFFFFFFFF  → nonzero, ~15-25 bits
  V2 ⊕ V1 = 0x0000000000000000  → zero!  → 1 bit
  V3 ⊕ V2 = 0x00003FFFFFFFFFFF  → nonzero, ~15-25 bits
  V4 ⊕ V3 = 0x00018000000000001 → nonzero, ~15-25 bits

The encoding for XOR results:

XOR = 0 (identical values)                         → 1 bit  (just a '0' flag)
XOR ≠ 0, same leading/trailing zeros as previous   → 2 + meaningful bits
XOR ≠ 0, different zero pattern                    → 2 + 5 + 6 + meaningful bits

The "leading/trailing zero" optimization is key. When V1 XOR V0 produces 0x00003FFFFFFFFFFF, there are 18 leading zero bits and 0 trailing zero bits. The encoder stores only the 46 "meaningful" bits plus metadata indicating where they sit. If the next XOR has the same zero pattern (which consecutive similar values often do), the metadata is skipped entirely.

Facebook's Gorilla paper measured production metrics and found: 51% of values XOR to exactly zero (cost: 1 bit each). 30% compress to ~26.6 bits. 19% need ~36.9 bits. The weighted average: ~9.5 bits per value.

Putting It Together: End-to-End Example

Take a single time series cpu_usage{host="web-01"} with 8 samples at 15-second intervals, where CPU hovers around 47%:

Sample  Timestamp (ms)     Value   TS bits   Value bits
------  -----------------  ------  --------  ----------
  0     1709827200000       47.2    64        64          (stored in full)
  1     1709827215000       47.3    14         ~22        (delta + XOR meaningful bits)
  2     1709827230000       47.3     1          1         (DD=0, XOR=0)
  3     1709827245000       47.2     1         ~22        (DD=0, XOR nonzero)
  4     1709827260000       47.3     1         ~22        (DD=0, similar XOR)
  5     1709827275000       47.3     1          1         (DD=0, XOR=0)
  6     1709827290000       47.2     1         ~22        (DD=0, XOR nonzero)
  7     1709827305000       48.1     1         ~22        (DD=0, XOR nonzero)
                                   -----     -----
                            Total:  84 bits  ~176 bits = 260 bits ≈ 33 bytes

Uncompressed: 8 samples × 16 bytes = 128 bytes
Compressed:   ~33 bytes
Ratio:        3.9x (small sample), approaches 12x as series lengthens

The first sample is expensive (128 bits). But from sample 2 onward, the average cost is roughly 11 bits per sample. Over a 2-hour chunk (480 samples at 15s intervals), the amortized compression ratio converges to the theoretical 12x.

The Math at Scale

Average bits per timestamp:  ~1.5  (most delta-of-deltas are 0)
Average bits per value:      ~9.5  (51% are 1 bit, rest are 15-37 bits)
Total per sample:            ~11 bits = 1.37 bytes

Uncompressed:                16 bytes per sample
Compression ratio:           16 / 1.37 ≈ 12x

At production scale:

Throughput	Uncompressed/day	Gorilla-compressed/day	Savings
1M samples/sec	1.38 TB	118 GB	1.26 TB
50M samples/sec	69 TB	5.9 TB	63 TB
500M samples/sec	691 TB	59 TB	632 TB

The difference between Gorilla and a generic 4x compressor at 500M samples/sec is 86 TB/day of additional storage. Over a year, that is 31 PB of excess storage cost.

Chunk Structure and Block Layout

Samples are not compressed individually. They are grouped into chunks, typically covering a fixed time window:

Prometheus: 2-hour blocks. Each block contains compressed chunks for all active series in that window. An index file maps series IDs to chunk offsets.
VictoriaMetrics: Monthly partitions with parts (similar to LSM-tree levels). Small parts are periodically merged into larger parts.
Facebook Gorilla (original): 2-hour in-memory blocks. Each block starts with a header containing the base timestamp. Samples stream-append into the block. Once the window closes, the block becomes immutable.

Chunk boundaries introduce fixed overhead: a header with the base timestamp, the first sample stored in full, and alignment padding. This is why very short chunks (under 30 seconds of data) compress poorly. The overhead is amortized over the chunk duration.

Variants and Adaptations

Every major TSDB has adapted Gorilla's core ideas:

Prometheus TSDB uses Gorilla encoding within 2-hour blocks stored on local disk. The block structure includes an index (label-to-series mapping), chunk files (Gorilla-compressed samples), and tombstones (for deletions). The compactor merges multiple 2-hour blocks into larger blocks for efficient querying.

VictoriaMetrics applies Gorilla compression in the hot path (vminsert to vmstorage) and layers ZSTD on top for cold data exported to S3 as Parquet files. The combination gets Gorilla's 12x for recent data and an additional 2-3x from ZSTD for archived data, pushing effective compression to 25-35x on cold storage.

M3DB (Uber) developed M3TSZ, a Gorilla variant optimized for their workload of 6.6 billion active time series. The key modification is more aggressive encoding of the leading/trailing zero metadata to reduce overhead for series with highly variable values.

InfluxDB IOx uses Apache Arrow columnar format with Gorilla encoding specifically for float columns. Timestamps use delta encoding in the Arrow representation. This hybrid approach integrates Gorilla's strength (float compression) with Arrow's strength (columnar scans, predicate pushdown).

Limitations

Irregular timestamps destroy compression efficiency. If your data arrives at random intervals, delta-of-deltas are rarely zero. Example: timestamps arriving at intervals of 15s, 13s, 18s, 14s produce delta-of-deltas of -2, 5, -4, each costing 9 bits instead of 1 bit. Compression drops from 12x to 5-7x, at which point ZSTD alone performs comparably with less complexity.

Labels and metadata are not compressed by Gorilla. The algorithm handles only the numeric timestamp-value pairs. The metric name, label keys, and label values (e.g., {service="checkout", region="us-east-1"}) require a separate inverted index. In VictoriaMetrics, the mergeset index handles this. In Prometheus, the index file within each block handles it.

Decode is sequential within a chunk. Because each sample's encoding depends on the previous sample (delta-of-delta, XOR with predecessor), you cannot jump to an arbitrary sample within a chunk. You must decode from the beginning. This is acceptable for typical metric queries (which read time ranges), but means random access within a chunk is O(n), not O(1). Prometheus mitigates this by keeping chunks small (2 hours) and using the index for coarse-grained seeking.

The first sample in every chunk is expensive. The base timestamp (64 bits) and first value (64 bits) are stored uncompressed. This 128-bit overhead is amortized across the chunk. A 2-hour chunk at 15-second intervals has 480 samples, making the overhead negligible (0.2%). A 30-second chunk has only 2 samples, making the overhead 50%.

The Problem: 16 Bytes Per Sample Adds Up Fast

A time-series sample is a timestamp-value pair: an int64 timestamp (8 bytes) and a float64 value (8 bytes). That is 16 bytes per sample, uncompressed.

Timestamp Compression: Delta-of-Delta

The Insight

Metric scraping happens at fixed intervals. If you scrape every 15 seconds, consecutive timestamps differ by exactly 15,000 milliseconds. The delta is constant. And the delta of the delta is zero.

Worked Example

Consider six consecutive timestamps from a 15-second scrape:

Raw timestamps (unix ms):
  T0 = 1709827200000
  T1 = 1709827215000
  T2 = 1709827230000
  T3 = 1709827245000
  T4 = 1709827260000
  T5 = 1709827275000

Deltas (T[n] - T[n-1]):
  D1 = 15000
  D2 = 15000
  D3 = 15000
  D4 = 15000
  D5 = 15000

Delta-of-deltas (D[n] - D[n-1]):
  DD2 = 0
  DD3 = 0
  DD4 = 0
  DD5 = 0

Every delta-of-delta is zero. Gorilla uses variable-length bit encoding for these values:

DD = 0                    →  1 bit   (just a '0' flag)
DD in [-63, 64]           →  9 bits  (2-bit header + 7-bit value)
DD in [-255, 256]         → 12 bits  (3-bit header + 9-bit value)
DD in [-2047, 2048]       → 16 bits  (4-bit header + 12-bit value)
Otherwise                 → 36 bits  (4-bit header + 32-bit value)

What Happens When a Scrape Is Late?

If the fourth scrape arrives 1 second late (at T3 = 1709827246000 instead of 1709827245000):

Deltas:
  D3 = 16000  (instead of 15000)
  D4 = 14000  (next one catches up)

Delta-of-deltas:
  DD3 = 16000 - 15000 = 1000   → 16 bits (fits in [-2047, 2048])
  DD4 = 14000 - 16000 = -2000  → 16 bits

One late scrape costs 32 extra bits instead of 2. This is why consistent scrape intervals matter: they are not just an operational nicety, they directly control compression efficiency.

Value Compression: XOR Encoding

The Insight

Worked Example

Consider five consecutive CPU utilization readings:

Values:  47.2,  47.3,  47.3,  47.2,  48.1

IEEE 754 hex representation:
  V0 = 0x4047999999999999  (47.2 — stored in full, 64 bits)
  V1 = 0x4047A66666666666  (47.3)
  V2 = 0x4047A66666666666  (47.3)
  V3 = 0x4047999999999999  (47.2)
  V4 = 0x404819999999999A  (48.1)

XOR with previous:
  V1 ⊕ V0 = 0x00003FFFFFFFFFFF  → nonzero, ~15-25 bits
  V2 ⊕ V1 = 0x0000000000000000  → zero!  → 1 bit
  V3 ⊕ V2 = 0x00003FFFFFFFFFFF  → nonzero, ~15-25 bits
  V4 ⊕ V3 = 0x00018000000000001 → nonzero, ~15-25 bits

The encoding for XOR results:

XOR = 0 (identical values)                         → 1 bit  (just a '0' flag)
XOR ≠ 0, same leading/trailing zeros as previous   → 2 + meaningful bits
XOR ≠ 0, different zero pattern                    → 2 + 5 + 6 + meaningful bits

Putting It Together: End-to-End Example

Take a single time series cpu_usage{host="web-01"} with 8 samples at 15-second intervals, where CPU hovers around 47%:

Sample  Timestamp (ms)     Value   TS bits   Value bits
------  -----------------  ------  --------  ----------
  0     1709827200000       47.2    64        64          (stored in full)
  1     1709827215000       47.3    14         ~22        (delta + XOR meaningful bits)
  2     1709827230000       47.3     1          1         (DD=0, XOR=0)
  3     1709827245000       47.2     1         ~22        (DD=0, XOR nonzero)
  4     1709827260000       47.3     1         ~22        (DD=0, similar XOR)
  5     1709827275000       47.3     1          1         (DD=0, XOR=0)
  6     1709827290000       47.2     1         ~22        (DD=0, XOR nonzero)
  7     1709827305000       48.1     1         ~22        (DD=0, XOR nonzero)
                                   -----     -----
                            Total:  84 bits  ~176 bits = 260 bits ≈ 33 bytes

Uncompressed: 8 samples × 16 bytes = 128 bytes
Compressed:   ~33 bytes
Ratio:        3.9x (small sample), approaches 12x as series lengthens

The Math at Scale

Average bits per timestamp:  ~1.5  (most delta-of-deltas are 0)
Average bits per value:      ~9.5  (51% are 1 bit, rest are 15-37 bits)
Total per sample:            ~11 bits = 1.37 bytes

Uncompressed:                16 bytes per sample
Compression ratio:           16 / 1.37 ≈ 12x

At production scale:

Throughput	Uncompressed/day	Gorilla-compressed/day	Savings
1M samples/sec	1.38 TB	118 GB	1.26 TB
50M samples/sec	69 TB	5.9 TB	63 TB
500M samples/sec	691 TB	59 TB	632 TB

The difference between Gorilla and a generic 4x compressor at 500M samples/sec is 86 TB/day of additional storage. Over a year, that is 31 PB of excess storage cost.

Chunk Structure and Block Layout

Samples are not compressed individually. They are grouped into chunks, typically covering a fixed time window:

Prometheus: 2-hour blocks. Each block contains compressed chunks for all active series in that window. An index file maps series IDs to chunk offsets.
VictoriaMetrics: Monthly partitions with parts (similar to LSM-tree levels). Small parts are periodically merged into larger parts.
Facebook Gorilla (original): 2-hour in-memory blocks. Each block starts with a header containing the base timestamp. Samples stream-append into the block. Once the window closes, the block becomes immutable.

Variants and Adaptations

Every major TSDB has adapted Gorilla's core ideas:

Architecture

The Problem: 16 Bytes Per Sample Adds Up Fast

Timestamp Compression: Delta-of-Delta

The Insight

Worked Example

What Happens When a Scrape Is Late?

Value Compression: XOR Encoding

The Insight

Worked Example

Putting It Together: End-to-End Example

The Math at Scale

Chunk Structure and Block Layout

Variants and Adaptations

Limitations

Key Points

Used By

Common Mistakes

Related

Gorilla Compression (Delta-of-Delta + XOR)

Architecture

The Problem: 16 Bytes Per Sample Adds Up Fast

Timestamp Compression: Delta-of-Delta

The Insight

Worked Example

What Happens When a Scrape Is Late?

Value Compression: XOR Encoding

The Insight

Worked Example

Putting It Together: End-to-End Example

The Math at Scale

Chunk Structure and Block Layout

Variants and Adaptations

Limitations

Key Points

Used By

Common Mistakes

Related