Storage & DatabasesTech 12 of 21

Time Series

QuestDB

The zero-GC time series database built for speed that actually delivers millions of rows per second

Use Cases

Financial market data and tick-level analyticsHigh-frequency IoT sensor ingestionReal-time dashboards over streaming dataNetwork telemetry and flow analyticsApplication performance monitoring at high resolutionCryptocurrency trading and exchange analytics

Architecture

Why This Exists

Most time series databases are fast enough for most workloads. InfluxDB handles 500K points/sec. TimescaleDB does 200-400K rows/sec. For a lot of teams, that is more than sufficient. But some workloads genuinely need millions of rows per second on a single machine: high-frequency trading, telecom network monitoring, large-scale IoT, real-time ad bidding.

QuestDB was built by a team that previously worked on low-latency trading systems, and it shows. The entire architecture is designed around avoiding the things that make databases slow: garbage collection pauses, unnecessary memory copies, row-by-row processing, and excessive system calls. The result is a database that can ingest 1.4M rows/sec and query billions of rows with sub-second latency, on hardware that costs less than expected.

The tradeoff is maturity. QuestDB is younger than InfluxDB and TimescaleDB. It does not have clustering, the community is smaller, and some enterprise features are still being built. But for single-node performance per dollar, nothing else comes close.

Storage Engine Internals

QuestDB stores data in a column-oriented, append-optimized format. Each table is a directory. Each column is a file within that directory. A table with 10 columns has 10 files plus metadata. This means reads that touch 2 of 10 columns only read 20% of the data from disk, same principle that makes ClickHouse fast for analytics.

Memory-mapped files are core to the architecture. Column files are mapped into virtual memory using mmap(). Reads and writes go through the OS page cache with no explicit buffer management in the application. This eliminates an entire layer of complexity (no buffer pool, no page replacement algorithm, no cache coherence logic) and lets the OS handle what it does best: managing page faults and caching hot data in RAM.

The JVM's garbage collector is essentially bypassed. All data structures for storage, indexing, and query processing use off-heap memory allocated with Unsafe.allocateMemory() or memory-mapped regions. The JVM heap stays small (typically under 1GB) even when QuestDB manages hundreds of gigabytes of data. This is what "zero-GC" means in practice. It is not that no GC happens, but that GC has so little work to do that pauses are measured in microseconds, not milliseconds.

Partitioning is time-based. Each partition is a subdirectory containing column files for that time range. The default is daily partitions, but hourly, monthly, or yearly options are available. When a query has a time range filter, QuestDB's planner eliminates partitions outside that range before touching any data. On a table with 365 daily partitions, a query over the last 3 days reads 3 partitions instead of 365. This is the biggest optimization for typical time series queries.

Ingestion Architecture

The InfluxDB Line Protocol (ILP) path is where QuestDB's speed comes from. Data arrives over TCP as text lines: sensors,device=d1 temperature=23.5 1640000000000000000. QuestDB parses these with a zero-copy parser that does not allocate Java objects for each field. Values get extracted directly from the network buffer and written to the appropriate column files.

Writes go through a write-ahead log (WAL) for durability. Committed data in the WAL survives crashes. The WAL writer batches multiple ILP lines into a single write operation, amortizing the fsync cost across many rows. This batching is automatic and transparent to the client.

Out-of-order (O3) ingestion is handled by a dedicated merge process. When data arrives with timestamps older than the current partition's latest timestamp, QuestDB buffers it in memory and periodically merges it into the correct position in the column files. The o3MaxLag setting controls how long QuestDB waits before committing out-of-order data. Larger values handle more skew but use more memory.

The result: on an 8-core AMD machine with NVMe storage, QuestDB ingests 1.4 million rows/sec through ILP with each row containing a timestamp, 2 tags, and 4 numeric fields. That number comes from the Time Series Benchmark Suite (TSBS) benchmarks.

Query Engine

QuestDB uses standard SQL with time series extensions. The query engine is hand-written (not generated) and uses SIMD instructions for filter evaluation and aggregation. A filter like WHERE temperature > 30.0 gets compiled to AVX2 instructions that process 8 double values per CPU cycle. This matters when scanning billions of rows.

Key time series SQL extensions:

SAMPLE BY is the time-bucketing operator. SELECT avg(temp), device FROM sensors SAMPLE BY 1h groups data into 1-hour buckets. Add FILL(PREV) to carry forward the last value for missing intervals, or FILL(LINEAR) for interpolation. ALIGN TO CALENDAR ensures buckets start at clean hour/day boundaries instead of relative to the first row.

LATEST ON retrieves the most recent value per group. SELECT * FROM sensors LATEST ON timestamp PARTITION BY device_id returns the latest reading for each device. This is a common IoT query that would require a correlated subquery or window function in standard SQL.

ASOF JOIN and LT JOIN are for joining time series that do not have matching timestamps. An ASOF JOIN matches each row in the left table with the most recent preceding row in the right table. This is essential for financial data where trades and quotes arrive at different times and correlation is needed.

Capacity Planning

Ingestion throughput depends on the protocol:

Protocol	Rows/sec (8 cores)	Use Case
ILP over TCP	1.0-1.4M	Production ingestion
ILP over UDP	800K-1M	Best-effort metrics
PostgreSQL wire	50-100K	Ad-hoc inserts, migrations
CSV COPY	200-400K	Bulk historical loads

Storage: QuestDB stores raw column data without compression by default (compression is being added). Expect roughly 8-16 bytes per numeric value. A table with 6 numeric columns at 100K rows/sec generates about 500MB/hour or 12GB/day. With daily partitions and 90-day retention, that is roughly 1.1TB. Memory-mapped files mean QuestDB benefits heavily from large RAM for caching hot partitions.

Memory: QuestDB needs enough RAM to hold the active partitions and the O3 merge buffers. For a workload with daily partitions and 2 days of hot data, 16GB handles most cases. At higher cardinality (10M+ unique series), the symbol dictionaries consume more memory. A 64GB machine handles even aggressive workloads comfortably on a single node.

Failure Scenarios

Scenario 1: Out-of-order data exceeds o3MaxLag and gets dropped. An IoT fleet has devices with unreliable clocks. Some devices send data with timestamps 30 minutes in the past. The default o3MaxLag is 10 minutes, so anything older than that gets silently dropped. The team does not notice until dashboards show gaps. Detection: compare ingested_rows_total against expected row count from data sources. If there is a persistent gap, out-of-order drops are likely. Recovery: increase o3MaxLag to cover the maximum expected clock skew. Be aware this increases memory usage because QuestDB buffers more data before committing. If the skew is extreme (hours), consider fixing the source clocks or adding a timestamp correction layer in the ingestion pipeline.

Scenario 2: Memory-mapped file limits cause failures. QuestDB uses mmap() extensively. On Linux, the default vm.max_map_count is 65536. With many partitions and columns, QuestDB can exhaust this limit. Symptom: "Cannot allocate memory" errors even when plenty of RAM is free. Detection: check cat /proc/sys/vm/max_map_count and compare against the number of memory-mapped regions in /proc/<pid>/maps. Recovery: increase vm.max_map_count to 1048576 or higher with sysctl -w vm.max_map_count=1048576. Add it to /etc/sysctl.conf to persist across reboots. Also set fs.file-max high enough to accommodate the number of open column files. This is a standard Linux tuning step for any memory-mapped database but easy to overlook.

Pros

• Ingestion speed is genuinely best-in-class. Millions of rows/sec on commodity hardware.
• Standard SQL with time series extensions, no new query language to learn
• Zero-GC Java implementation avoids the latency spikes that plague other JVM databases
• Built-in support for InfluxDB Line Protocol and PostgreSQL wire protocol
• Column-oriented storage with SIMD-accelerated query execution

Cons

• Younger project with a smaller community compared to InfluxDB or TimescaleDB
• No built-in replication or clustering yet. Single-node only for now.
• Limited support for UPDATE and DELETE operations
• Ecosystem of integrations and connectors is still growing
• Documentation covers the basics but lacks depth on advanced operational topics

When to use

• You need the fastest possible ingestion for high-volume time series data
• Financial or trading applications where microsecond timestamps matter
• Your team wants SQL and does not want to learn InfluxQL, Flux, or PromQL
• Single-node deployment is acceptable and you want maximum performance per node

When NOT to use

• You need multi-node clustering or built-in replication for HA
• Heavy UPDATE/DELETE workloads on existing data
• You need a proven, battle-tested solution for mission-critical systems
• Prometheus-compatible monitoring (use VictoriaMetrics instead)

Key Points

•QuestDB achieves zero garbage collection pauses by managing memory outside the JVM heap. It uses memory-mapped files and off-heap data structures for all storage operations. The JVM's garbage collector has almost nothing to collect, which eliminates the GC pauses that make other Java databases unsuitable for latency-sensitive workloads.
•The storage engine is column-oriented with each column stored in a separate file. Timestamps go in timestamp.d, values in value.d, and so on. This means a query that only reads timestamp and temperature never touches the humidity, pressure, or device_id columns. Same principle as ClickHouse, but QuestDB adds memory-mapped I/O so the OS page cache handles data access with minimal system call overhead.
•Ingestion through the InfluxDB Line Protocol (ILP) is the fastest path. QuestDB processes ILP data with zero-copy parsing and batched writes to column files. Benchmarks show 1.4 million rows/sec on a single 8-core machine. The PostgreSQL wire protocol is slower (it was not designed for bulk ingestion) but useful for ad-hoc queries and integration with existing tools.
•SAMPLE BY is QuestDB's time-bucketing syntax and it is cleaner than most alternatives. Instead of writing GROUP BY time_bucket('5 minutes', timestamp), the syntax is just SAMPLE BY 5m. It supports FILL for handling gaps (FILL(PREV), FILL(LINEAR), FILL(NULL)) and ALIGN TO CALENDAR for consistent bucket boundaries.
•Symbol columns are QuestDB's answer to low-cardinality string data. Instead of storing the full string for every row, QuestDB maintains a dictionary and stores integer indices. Filters on symbol columns use the dictionary for fast lookups. For a column like 'region' with 10 unique values across a billion rows, symbol storage saves massive amounts of space.
•Designated timestamp columns enable partition pruning. When creating a table with a designated timestamp and partition by day/month/year, queries with time range filters skip entire partitions. A query over the last hour on a year-long table reads 1 partition instead of 365.

Common Mistakes

✗Using the PostgreSQL wire protocol for high-volume ingestion. The PG protocol serializes data as text, parses it row-by-row, and has per-statement overhead. For bulk ingestion, always use ILP over TCP. The difference is easily 10-50x in throughput.
✗Not using SYMBOL type for low-cardinality string columns. Storing 'us-east-1' as a VARCHAR in every row wastes space and makes filtering slow. SYMBOL type stores the integer index and filters against the dictionary. For any column with fewer than 100K unique values, use SYMBOL.
✗Creating too many partitions. Partitioning by hour on a table that stores a year of data creates 8,760 partitions. Each partition has overhead for open file descriptors and metadata. Partition by day for most workloads, by month if the data volume per day is small.
✗Ignoring out-of-order ingestion settings. QuestDB handles out-of-order data, but it has limits. The o3MaxLag setting (default 10 minutes) controls how far back out-of-order data can arrive. If data sources have clock skew beyond this window, data gets dropped silently. Increase o3MaxLag for unreliable data sources, but know that larger values use more memory.
✗Expecting QuestDB to handle transactional workloads. There are no multi-row transactions, no rollback, and no isolation levels. QuestDB can update rows or delete specific records, but it is slow and not the intended use case. Treat it as an append-only store with occasional maintenance operations.

Related Technologies