ZSTD Compression (Zstandard)
Architecture
The Problem: Not Everything Is Time-Series Data
Gorilla compression achieves 12x on time-series samples because it exploits two domain-specific properties: regular timestamp intervals and slowly-changing float values. But not all data in an observability platform is sequential time-series samples.
Downsampled metric aggregates (min, max, avg, sum, count) are raw float64 doubles with no sequential relationship. Parquet column blocks contain mixed data types. Kafka message payloads are protobuf-encoded structs. Flink checkpoint state is serialized RocksDB entries. WAL segments are append-only binary logs. None of these have the structure that Gorilla needs.
These workloads need a general-purpose compressor that works on arbitrary bytes, compresses well, and decompresses fast. ZSTD fills that gap.
How ZSTD Works: Three Stages
ZSTD processes input data through three stages: find repeated patterns, encode them as compact sequences, then compress those sequences with entropy coding. Once you see how each stage works, the compression level tradeoffs start to make sense.
Stage 1: LZ77-Style Dictionary Matching
The first stage slides a window over the input, looking for byte sequences that appeared earlier. When it finds a match, it replaces the repeated bytes with a back-reference: "copy N bytes from M positions back."
Input string (84 bytes):
"service=checkout,method=POST,status=200|service=checkout,method=GET,status=200"
Stage 1 finds matches:
Position 0-40: "service=checkout,method=POST,status=200|" → stored as literals (first occurrence)
Position 41-58: "service=checkout," → match! (offset=41, length=17)
Position 58-68: "method=GET," → partial: "method=" matches (offset=35, length=7), "GET," is literal
Position 68-78: "status=200" → match! (offset=31, length=10)
Result: 84 bytes → ~55 bytes of literals + back-references
The compression level controls how hard this stage searches for matches. Level 1 uses a fast hash table with limited look-back. Level 19 uses a full optimal parser that considers every possible match combination. More searching = better matches = smaller output = more CPU time.
Stage 2: Sequence Encoding
Stage 1 produces a stream of interleaved literals and matches. Stage 2 structures this into a compact format:
Each "sequence" is a triple:
(literal_length, offset, match_length)
From the example above:
Sequence 1: (40 literals, -, -) → the first 40 bytes verbatim
Sequence 2: (0 literals, offset=41, match_length=17) → copy "service=checkout,"
Sequence 3: (4 literals "GET,", offset=35, match_length=7) → "method=" copied, "GET," literal
Sequence 4: (0 literals, offset=31, match_length=10) → copy "status=200"
This sequence representation is more compressible than the raw back-references because the literal lengths, offsets, and match lengths each follow predictable distributions (short matches are common, large offsets are rare).
Stage 3: Entropy Coding (FSE + Huffman)
The final stage compresses the sequence stream using two entropy coders:
Huffman coding compresses the literal bytes. In the example, the characters s, e, =, , appear far more often than P, O, G. Huffman assigns shorter bit codes to frequent characters and longer codes to rare ones. If e appears 10% of the time, it gets ~3.3 bits instead of 8.
Finite State Entropy (FSE) compresses the match lengths and offsets. This is the part that really sets ZSTD apart from older algorithms. FSE achieves the same compression as arithmetic coding (the theoretical optimum) but decodes with a single table lookup per symbol instead of a division. That single difference is why ZSTD decompresses at 1500 MB/s while gzip (which uses Huffman for everything) tops out at 300 MB/s.
The full pipeline on our 84-byte example:
Input: 84 bytes (raw string)
After Stage 1 (matching): ~55 bytes (literals + back-references)
After Stage 2 (sequences): ~45 bytes (structured triples)
After Stage 3 (entropy): ~28 bytes (bit-optimal encoding)
Compression ratio: 84 / 28 = 3.0x
On larger inputs, ZSTD achieves better ratios because the matching window is bigger (up to 128 KB at level 1, up to 8 MB at level 19) and the entropy coder has more data to learn symbol frequencies from.
Compression Levels: The Speed-Ratio Tradeoff
ZSTD's 22 compression levels control how hard Stage 1 searches for matches. Decompression speed stays constant regardless of level because the compressed format is the same. Only the search effort during compression changes.
| Level | Compression Speed | Decompression Speed | Ratio (Silesia benchmark) | Use Case |
|---|---|---|---|---|
| 1 | ~500 MB/s | ~1500 MB/s | ~2.9x | Real-time: Kafka messages, network transfer |
| 3 (default) | ~400 MB/s | ~1500 MB/s | ~3.3x | General purpose: Parquet blocks, log files |
| 6 | ~150 MB/s | ~1500 MB/s | ~3.6x | Warm-tier storage: balanced speed and ratio |
| 9 | ~80 MB/s | ~1500 MB/s | ~3.8x | Cold-tier: write-once archival |
| 15 | ~15 MB/s | ~1500 MB/s | ~4.1x | Deep archival: maximize compression |
| 19 | ~5 MB/s | ~1500 MB/s | ~4.3x | Maximum compression (rarely worth the CPU) |
| 22 | ~2 MB/s | ~1500 MB/s | ~4.4x | Extreme: diminishing returns beyond level 19 |
Notice that decompression is always ~1500 MB/s. You pay the CPU cost of high compression levels only once (on write), but every read benefits from fast decompression. For write-once-read-many workloads like cold-tier metric archives and Parquet files, that tradeoff is well worth it.
Dictionary Mode: Compressing Small Data
Standard ZSTD needs enough input bytes to find patterns. On a 1 KB Kafka message, there simply are not enough repeated sequences for the matching stage to work with. You end up with a poor ~1.5x ratio, barely worth the CPU.
Dictionary mode solves this. You train a dictionary on a representative sample of your data:
Training (offline, one-time):
1. Collect 1000 sample Kafka messages (e.g., metric protobuf payloads)
2. zstd --train samples/ -o metric_dict --maxdict=100KB
3. ZSTD analyzes common byte patterns across all samples
4. Produces a 100 KB dictionary of shared context
Compression (runtime):
Without dictionary: 2 KB message → 1.3 KB (1.5x) (not worth it)
With dictionary: 2 KB message → 500 bytes (4.0x) (significant savings)
The dictionary provides the "history" that a small message lacks.
ZSTD matches input bytes against the dictionary as if they were
preceded by 100 KB of representative data.
Kafka supports ZSTD compression natively (producer config: compression.type=zstd). For dictionary mode, the application must manage dictionary distribution (ship the same dictionary to all producers and consumers). This complexity is worth it when compressing millions of small messages per second.
ZSTD vs Other Compressors
| Algorithm | Compression Speed | Decompression Speed | Ratio (Silesia) | Year | Best For |
|---|---|---|---|---|---|
| LZ4 | ~750 MB/s | ~4000 MB/s | ~2.1x | 2011 | Hot path where speed is paramount (WAL, memtable flush) |
| Snappy | ~500 MB/s | ~1500 MB/s | ~2.1x | 2011 | Legacy systems, Kafka default before ZSTD |
| gzip | ~30 MB/s | ~300 MB/s | ~3.2x | 1992 | HTTP content encoding, backward compatibility |
| ZSTD | ~400 MB/s | ~1500 MB/s | ~3.3x | 2016 | General purpose (replaced gzip in most new systems) |
| brotli | ~20 MB/s | ~400 MB/s | ~3.6x | 2015 | Static web assets (HTML, CSS, JS) |
Why did ZSTD win? It matches gzip's compression ratio while decompressing 5x faster and compressing 10x faster. That made it the natural default for Kafka (replacing Snappy), Parquet (replacing gzip), and RocksDB (replacing LZ4/Snappy depending on tier).
LZ4 still wins when decompression speed is the only priority (4000 MB/s vs ZSTD's 1500 MB/s). RocksDB uses LZ4 for its hot-tier SST files and ZSTD for cold-tier compacted files, a common hot/cold tiering pattern.
Where ZSTD Appears in Data Infrastructure
ZSTD shows up at every layer below the hot path in modern data systems:
Columnar storage (Parquet, ORC). Parquet compresses each column block with ZSTD. Aggregated data (pre-computed doubles like min, max, avg, sum, count) has no sequential relationship between rows, so domain-specific compressors like Gorilla cannot help. ZSTD gives ~3x on these float64 arrays and is now the standard compression codec for data lake files on S3/GCS.
Message brokers (Kafka, Pulsar). Producers compress message batches with ZSTD level 1 before sending. Brokers store compressed batches on disk. Consumers decompress on read. At high throughput, ZSTD reduces network bandwidth by ~4x with minimal CPU overhead at level 1.
Embedded storage engines (RocksDB, LevelDB). RocksDB compresses SST files at the block level. Hot tiers (L0/L1) typically use LZ4 for fast read/write. Cold tiers (L2+) use ZSTD for better compression on compacted data read less frequently. Any system built on RocksDB follows this tiered pattern: Flink state backends, CockroachDB, TiKV.
Trace and log storage. Systems that store traces or logs as Parquet files on object storage use ZSTD for column block compression. Variable-length strings (service names, operation names, attribute keys) compress well because repeated values create strong matching opportunities for Stage 1.
ZSTD vs Gorilla: Different Tools for Different Data
Gorilla and ZSTD are not competing. They handle different data at different tiers.
Gorilla ZSTD
------- ----
Input: Sequential time-series samples Arbitrary bytes
Exploits: Regular intervals, slow changes Repeated byte patterns
Ratio: ~12x on regular metrics ~3x on float64 arrays
Speed: O(1) per sample encode/decode ~400 MB/s compress, ~1500 MB/s decompress
Works on Parquet: No (aggregated doubles have Yes (general-purpose,
no sequential relationship) works on any byte stream)
Hot tier: Yes (vmstorage raw samples) No (too slow for ingest path)
Cold tier: No (data is pre-aggregated) Yes (Parquet blocks on S3)
Can stack: ZSTD on top of Gorilla gives N/A
another 2-3x for cold export
(25-35x total)
VictoriaMetrics uses exactly this layered approach: Gorilla in the hot path (vmstorage), and ZSTD-compressed Parquet for cold-tier S3 exports.
Limitations
CPU cost scales non-linearly with compression level. Going from level 1 to level 3 costs ~20% more CPU for ~15% better ratio. Going from level 3 to level 19 costs ~80x more CPU for only ~30% better ratio. Beyond level 9, diminishing returns hit hard. Most production systems stay at level 1-6.
No domain-specific optimization. ZSTD treats input as opaque bytes. If your data has exploitable structure (like time-series regularity), a domain-specific compressor like Gorilla will always win on that dimension. ZSTD is the best general-purpose option, not the best option for any specific data type.
Dictionary mode requires coordination. All producers and consumers must share the same dictionary. Dictionary changes require a rolling deployment. If a consumer receives data compressed with an unknown dictionary, decompression fails. This operational complexity means dictionary mode is only worth it for high-volume small-message workloads.
Memory usage scales with window size and compression level. Level 1 uses ~1 MB of memory per compression context. Level 19 uses ~100+ MB. For workloads compressing thousands of streams in parallel (like Kafka brokers compressing per-partition), memory usage at high levels is non-trivial.
Key Points
- •ZSTD combines LZ77-style dictionary matching (find repeated byte sequences, replace with back-references) with two entropy coders: Finite State Entropy (FSE) for match lengths and offsets, and Huffman coding for literal bytes. This three-stage pipeline achieves gzip-level compression ratios at LZ4-level speeds
- •Compression levels 1-22 let you trade CPU time for compression ratio on the same data. Level 1 compresses at ~500 MB/s with ~2.9x ratio. Level 3 (default) hits ~400 MB/s at ~3.3x. Level 9 drops to ~80 MB/s but reaches ~3.8x. Decompression speed stays at ~1500 MB/s no matter which level was used to compress. That asymmetry matters for write-once-read-many workloads like cold storage, where you pay the compression cost once but decompress on every read
- •Dictionary mode is ZSTD's secret weapon for small data. Standard compression needs enough bytes to learn patterns within each input. For small messages (1-10 KB, like Kafka records or log lines), there is not enough data to find patterns. Dictionary mode pre-trains a 100 KB dictionary on representative samples, then every message compresses against that shared context. Compression ratio improves 3-5x over standard mode for small inputs
- •ZSTD is a general-purpose byte-level compressor. It sees no structure in the input, just bytes. Gorilla, by contrast, exploits time-series-specific patterns like regular timestamp intervals and slowly-changing float values. So Gorilla gets ~12x on sequential metric samples, while ZSTD gets ~3x on the same float64 arrays. Different tools for different tiers: Gorilla for hot-tier sequential samples, ZSTD for cold-tier aggregated data and Parquet blocks
- •ZSTD replaced gzip as the default compressor in most modern data infrastructure: Apache Kafka (producer compression), Apache Parquet (column block compression), RocksDB (SST file compression), ClickHouse (column compression), Linux kernel (initramfs, btrfs). It compresses as well as gzip but decompresses 5x faster, and dictionary mode handles small messages that gzip struggles with. That is a hard combination to beat
Used By
Common Mistakes
- ✗Using high compression levels (15-22) in latency-sensitive paths. Level 19 compresses at ~5 MB/s, which is 100x slower than level 1. High levels only make sense for offline batch jobs like cold-tier compaction or archival. For real-time ingestion and Kafka messages, stick to levels 1-3
- ✗Not using dictionary mode for small messages. Compressing 2 KB Kafka records with standard ZSTD gives ~1.5x ratio (barely worth the CPU). Training a dictionary on 1000 sample records and compressing with that dictionary gives ~4x. The dictionary is 100 KB of shared context that makes small-message compression viable
- ✗Comparing ZSTD ratio to Gorilla ratio as if they do the same thing. Gorilla achieves 12x on time-series data because it exploits domain-specific structure (delta-of-delta timestamps, XOR-encoded values). ZSTD achieves ~3x on the same data because it only sees bytes. They solve different problems: Gorilla for hot-tier raw samples, ZSTD for cold-tier aggregated doubles in Parquet
- ✗Ignoring decompression speed when choosing a compressor. ZSTD decompresses at ~1500 MB/s regardless of the compression level. gzip tops out at ~300 MB/s. On read-heavy workloads like cold-tier metric queries (write once, read many times), decompression speed matters more than compression speed, and that 5x gap is why ZSTD replaced gzip in Parquet and RocksDB
- ✗Setting the same compression level across all data tiers. Hot-path Kafka messages need level 1 (fast, ~2.9x). Warm-tier Parquet blocks benefit from level 3-6 (~3.3-3.6x). Cold-tier archival can use level 9+ (~3.8x+). Each tier has a different latency budget, so each should use a different level