Hot, Warm & Cold Data Tiering

Why It Exists

Not all data deserves the same storage. The last 24 hours of orders get queried 500 times per second. Last month's orders see maybe 10 queries per minute. Last year's orders sit there for compliance and the occasional customer support lookup.

Storing everything in Redis because "it is fast" costs $3,000 per month per terabyte. Storing everything in S3 because "it is cheap" means the checkout page takes 8 seconds to load order history. Tiering delivers fast where it matters and cheap where it does not.

The economics tell the whole story. 1 TB in Redis costs roughly $3,000/month. The same data on SSD-backed PostgreSQL costs about $100/month. On S3 it is $23/month. On S3 Glacier it is $4/month. That is a 750x cost difference between the fastest and cheapest tier. For a company storing 50 TB, the difference between "everything in one tier" and "intelligent tiering" can be $100K/month.

Defining the Tiers

Hot tier. Data accessed multiple times per second. Latency budget: sub-millisecond to single-digit milliseconds. Lives in memory (Redis, Memcached) or on NVMe SSDs with in-memory caching (PostgreSQL with shared_buffers tuned, Elasticsearch hot nodes).

Examples: active user sessions, real-time dashboards, live order status, current pricing, feature flags, rate limiter counters.

Warm tier. Data accessed a few times per minute to a few times per hour. Latency budget: 10-100ms. Lives on SSD-backed databases or Elasticsearch warm nodes. This is where most queryable data sits.

Examples: last 30 days of orders, recent log data, user profiles accessed occasionally, product catalog, recent search history.

Cold tier. Data accessed a few times per day or less. Latency budget: seconds to minutes. Acceptable because the queries are infrequent and the user expects to wait. Lives on object storage (S3), HDFS, Elasticsearch frozen nodes, or searchable snapshots.

Examples: compliance archives, historical analytics beyond 90 days, old audit trails, decommissioned product data, old user-generated content.

Frozen/archive tier. Data that might never be accessed again but must be retained for legal or regulatory reasons. Latency budget: hours (restore from archive before querying). Lives on S3 Glacier, S3 Glacier Deep Archive, or tape.

Examples: legal holds, 7-year financial records, HIPAA-mandated medical data retention.

Query Routing Patterns (Concrete Example)

Here is how tiering works in practice. Take an e-commerce order history API.

Request: GET /api/orders?user_id=123&days=365

The query router splits this into three sub-queries based on tier boundaries:

Hot path (last 24 hours). Query Redis: ZRANGEBYSCORE orders:123 <24h_ago> +inf. Returns in 0.3ms. Finds 2 recent orders.

Warm path (1 to 90 days). Query PostgreSQL: SELECT * FROM orders WHERE user_id = 123 AND created_at > now() - interval '90 days'. Returns in 8ms. Finds 15 orders.

Cold path (90 to 365 days). Query S3 via Athena: SELECT * FROM orders_archive WHERE user_id = 123 AND year_month >= '2025-06'. Returns in 1.2 seconds. Finds 8 orders.

The API merges all three results and returns 25 orders. Total response time is dominated by the cold-tier query at 1.2 seconds. But here is what matters: for the common case where a user checks their recent orders (last 7 days), only the hot and warm paths fire. That response comes back in under 10ms.

The alternative without tiering? Store everything in PostgreSQL. Every query scans 365 days of data regardless of whether the user asked for yesterday or last year. At scale, the table grows to billions of rows and the database spends most of its time scanning old data that nobody asked for.

A smarter version of the cold path: pre-aggregate old data into monthly summaries and store those summaries in the warm tier. Most cold-tier queries do not need individual records. "Total spending: $2,340 in March 2025" is often enough. This avoids the Athena round-trip entirely for summary requests.

Promotion and Demotion Policies

Time-based. The simplest approach. Move data from hot to warm after 24 hours, warm to cold after 90 days. This works when access patterns correlate with age, which they do for most transactional data. Most teams start here and it is good enough for a long time.

Access-count-based. Track how often each record (or partition, or index) is accessed. Demote when the access count falls below a threshold over a time window. More accurate than time-based, but requires access tracking infrastructure. Elasticsearch ILM supports this natively.

Hybrid. Time-based demotion with access-count-based promotion. If a cold record suddenly gets accessed frequently (a viral old blog post, a reopened support ticket, a legal discovery request), promote it back to warm. This is the best of both worlds but the most complex to implement.

Manual override. Some data is always hot regardless of access patterns. Configuration, feature flags, pricing data. Pin it to the hot tier explicitly and never demote it.

Concrete policy example (Elasticsearch ILM):

Phase	Age	Actions	Node Type
Hot	0-7 days	3 replicas, force merge to 1 segment	NVMe SSD, 64GB RAM
Warm	7-30 days	Shrink to 1 shard, 1 replica, read-only	SSD, 32GB RAM
Cold	30-90 days	Searchable snapshot on S3, frozen	Minimal (S3-backed)
Delete	90+ days	Delete index	N/A

This policy handles a logging pipeline doing 100 GB/day. Only the last week sits on expensive hot hardware. Everything older than a month is on S3. Total storage cost drops by roughly 80% compared to keeping everything on hot nodes.

Built-in Tiered Storage in Practice

Elasticsearch ILM is the most battle-tested tiering implementation in the ecosystem. It provides lifecycle policies that automatically roll over indices when they hit a size or age threshold, shrink them, freeze them, and eventually delete them. Hot nodes run NVMe SSDs with high CPU for indexing and search. Warm nodes run cheaper SSDs with less CPU. Cold and frozen nodes back their indices with S3 searchable snapshots, paying almost nothing for storage while keeping the data queryable (at higher latency). The query API is identical across tiers. The application does not know or care which tier serves a particular index.

Kafka Tiered Storage (KIP-405) solves a different problem. Kafka brokers traditionally keep all log segments on local disk. With 90 days of retention on a topic doing 1 TB/day, that requires 90 TB of broker disk. Tiered storage offloads older segments to S3 while keeping recent segments on local SSD. Consumers reading the latest data hit local disk with normal latency. Consumers replaying from 3 months ago transparently fetch from S3. This makes it practical to set Kafka retention to "forever" without breaking the bank on broker hardware.

ClickHouse with S3-backed MergeTree works at the partition level. Recent partitions (this week's data) live on local NVMe for fast queries. Old partitions automatically move to S3 based on a storage_policy configuration. Queries that span both local and S3 partitions run transparently. The catch: S3-backed partitions are slower to scan, so cold queries take longer. Pre-aggregate old data into rollup tables for sub-second analytics on historical ranges.

Custom tiering (Redis + PostgreSQL + S3). When the database does not have built-in tiering, the application handles routing. Check Redis first. On miss, check PostgreSQL. If the data is older than the warm tier boundary, query S3 via Athena or a similar query engine. This is what most teams build, and it works fine. The downside is that every new feature needs to be aware of the tiering logic.

Cost Analysis

Tier	Storage	Cost per TB/month	Read Latency	Example Tech
Hot	In-memory	~$3,000	< 1ms	Redis, Memcached
Warm	SSD-backed DB	~$100	5-50ms	PostgreSQL, ES hot nodes
Cold	Object storage	~$23	100ms-5s	S3, ES frozen tier
Frozen	Archive	~$4	1-12 hours	S3 Glacier Deep Archive

Real example. A system storing 10 TB total with typical access distribution: 100 GB hot + 1 TB warm + 9 TB cold.

With tiering: $300 (hot) + $100 (warm) + $207 (cold) = $607/month
All in PostgreSQL: ~$1,000/month, and cold-tier queries are slow because the database is scanning 10 TB
All in Redis: ~$30,000/month. Do not laugh, I have seen teams do this

The tiering payoff grows with data volume. At 100 TB, the savings are over $10K/month.

When Not to Tier

Tiering adds complexity. Every query needs to know which tier to hit. Every new feature needs to respect tier boundaries. Debugging becomes harder because data lives in three places.

Skip tiering if the total dataset fits on a single SSD (under 500 GB) and query latency is acceptable. One tier, one technology, one place to look when something breaks.

Also skip it if the access patterns are uniform. If every record gets queried with roughly equal frequency (a configuration store, a small product catalog), there is no "cold" data to move. The whole point of tiering is exploiting the fact that most data is rarely accessed. If that is not true for a given workload, tiering is overhead with no payoff.

Failure Scenarios

Scenario 1: Hot tier goes down, warm tier gets crushed. Redis crashes. Every request that used to hit Redis now falls through to PostgreSQL. The database was sized for warm-tier load (50 QPS), not the full hot-tier load (5,000 QPS). Connection pool exhaustion hits within seconds. Queries start timing out. The monitoring dashboard, which also queries PostgreSQL, goes dark.

Detection: Alert on Redis availability and on PostgreSQL connection pool utilization crossing 80%.

Prevention: Put a circuit breaker on the hot-tier fallback path. When Redis is down, return a degraded response (cached from the last successful read, or a "temporarily unavailable" status) instead of blindly forwarding all traffic to the warm tier. Pre-compute a capacity buffer for how much extra load the warm tier can absorb and set the circuit breaker threshold accordingly. In practice, a warm tier can usually handle 2-3x its normal load for short bursts, not 100x.

Scenario 2: Cold-tier query blocks the API. A customer support agent searches 3 years of order history. The Athena query scans 500 GB of Parquet files and takes 8 seconds. The API gateway has a 5-second timeout. The agent sees a 504 error. They retry. Now two Athena queries are running.

Detection: Track cold-tier query latency as a separate SLI from hot/warm. Alert when p95 exceeds the API gateway timeout.

Fix: Never block a synchronous API on a cold-tier scan. Use an async query pattern: return a job ID immediately, let the client poll for results or subscribe to a notification. The UI shows "Loading historical data..." instead of a timeout error. For common cold-tier queries, pre-aggregate results into the warm tier on a nightly batch job so the live query never needs to touch S3.

Tool	Type	Best For	Scale
Elasticsearch ILM	Open Source	Log and event data lifecycle with automatic rollover, shrink, freeze, and delete	Medium-Enterprise
ClickHouse Tiered Storage	Open Source	Analytics data with volume-based policies, S3-backed MergeTree for cold partitions	Medium-Enterprise
Kafka Tiered Storage (KIP-405)	Open Source	Event log retention beyond broker disk, transparent S3 offload for old segments	Large-Enterprise
AWS S3 Intelligent-Tiering	Managed	Object storage with automatic access-pattern-based tiering, no retrieval fees	Small-Enterprise
Snowflake	Commercial	Transparent hot/warm/cold with auto-scaling compute per tier, zero admin	Medium-Enterprise

Why It Exists

Defining the Tiers

Examples: active user sessions, real-time dashboards, live order status, current pricing, feature flags, rate limiter counters.

Examples: last 30 days of orders, recent log data, user profiles accessed occasionally, product catalog, recent search history.

Examples: compliance archives, historical analytics beyond 90 days, old audit trails, decommissioned product data, old user-generated content.

Examples: legal holds, 7-year financial records, HIPAA-mandated medical data retention.

Query Routing Patterns (Concrete Example)

Here is how tiering works in practice. Take an e-commerce order history API.

Request: GET /api/orders?user_id=123&days=365

The query router splits this into three sub-queries based on tier boundaries:

Hot path (last 24 hours). Query Redis: ZRANGEBYSCORE orders:123 <24h_ago> +inf. Returns in 0.3ms. Finds 2 recent orders.

Warm path (1 to 90 days). Query PostgreSQL: SELECT * FROM orders WHERE user_id = 123 AND created_at > now() - interval '90 days'. Returns in 8ms. Finds 15 orders.

Cold path (90 to 365 days). Query S3 via Athena: SELECT * FROM orders_archive WHERE user_id = 123 AND year_month >= '2025-06'. Returns in 1.2 seconds. Finds 8 orders.

Promotion and Demotion Policies

Manual override. Some data is always hot regardless of access patterns. Configuration, feature flags, pricing data. Pin it to the hot tier explicitly and never demote it.

Concrete policy example (Elasticsearch ILM):

Phase	Age	Actions	Node Type
Hot	0-7 days	3 replicas, force merge to 1 segment	NVMe SSD, 64GB RAM
Warm	7-30 days	Shrink to 1 shard, 1 replica, read-only	SSD, 32GB RAM
Cold	30-90 days	Searchable snapshot on S3, frozen	Minimal (S3-backed)
Delete	90+ days	Delete index	N/A

Built-in Tiered Storage in Practice

Cost Analysis

Tier	Storage	Cost per TB/month	Read Latency	Example Tech
Hot	In-memory	~$3,000	< 1ms	Redis, Memcached
Warm	SSD-backed DB	~$100	5-50ms	PostgreSQL, ES hot nodes
Cold	Object storage	~$23	100ms-5s	S3, ES frozen tier
Frozen	Archive	~$4	1-12 hours	S3 Glacier Deep Archive

Real example. A system storing 10 TB total with typical access distribution: 100 GB hot + 1 TB warm + 9 TB cold.

With tiering: $300 (hot) + $100 (warm) + $207 (cold) = $607/month
All in PostgreSQL: ~$1,000/month, and cold-tier queries are slow because the database is scanning 10 TB
All in Redis: ~$30,000/month. Do not laugh, I have seen teams do this

The tiering payoff grows with data volume. At 100 TB, the savings are over $10K/month.

When Not to Tier

Tiering adds complexity. Every query needs to know which tier to hit. Every new feature needs to respect tier boundaries. Debugging becomes harder because data lives in three places.

Skip tiering if the total dataset fits on a single SSD (under 500 GB) and query latency is acceptable. One tier, one technology, one place to look when something breaks.

Failure Scenarios

Detection: Alert on Redis availability and on PostgreSQL connection pool utilization crossing 80%.

Detection: Track cold-tier query latency as a separate SLI from hot/warm. Alert when p95 exceeds the API gateway timeout.

Architecture Diagram

Why It Exists

Defining the Tiers

Query Routing Patterns (Concrete Example)

Promotion and Demotion Policies

Built-in Tiered Storage in Practice

Cost Analysis

When Not to Tier

Failure Scenarios

Key Points

Tool Comparison

Common Mistakes

Related Topics

Hot, Warm & Cold Data Tiering

Architecture Diagram

Why It Exists

Defining the Tiers

Query Routing Patterns (Concrete Example)

Promotion and Demotion Policies

Built-in Tiered Storage in Practice

Cost Analysis

When Not to Tier

Failure Scenarios

Key Points

Tool Comparison

Common Mistakes

Related Topics