HTTP Caching Deep Dive
HTTP caching layers browser, CDN, and proxy caches using Cache-Control directives, ETags, and conditional requests to serve repeat traffic without hitting origin.
The Problem
Without caching, every request hits origin — wasting bandwidth and increasing latency. A single uncached image served 10 million times per day means 10 million origin round trips. HTTP caching solves this at multiple layers (browser, CDN, proxy), but misconfigured headers cause either stale content served to users or zero cache benefit despite the infrastructure.
Mental Model
Like a chain of libraries — the browser is a personal bookshelf (fastest, smallest), the CDN is the local branch (close, moderate size), and the origin is the national library (authoritative, slowest). Each level checks if it has the book before asking the next one.
Architecture Diagram
How It Works
HTTP caching is the single most impactful performance optimization for web applications. Properly configured caching eliminates redundant data transfer, reduces origin server load by orders of magnitude, and cuts response latency from hundreds of milliseconds to zero. Improperly configured caching serves stale content to users, leaks private data through shared caches, or provides no benefit at all.
The HTTP caching model operates across multiple layers, each with different characteristics and failure modes.
The Cache Hierarchy
Every HTTP response potentially flows through four cache layers:
- Browser cache (private): Stores responses on disk or in memory. Fastest possible retrieval — no network round trip at all. Controlled by
max-ageandno-store. - Service Worker cache (private): Application-controlled cache using the Cache API. Enables offline-first behavior and custom caching strategies.
- CDN / Edge cache (shared): Globally distributed. Responses cached at the edge serve users from the nearest PoP. Controlled by
s-maxage. - Reverse proxy cache (shared): Varnish or Nginx in front of the origin. Absorbs traffic spikes and protects the application tier.
A well-configured system has most requests satisfied by layer 1 or 2, with the CDN handling the rest. The origin should only see truly unique or freshly changed requests.
Cache-Control: The Complete Directive Set
The Cache-Control header is where all caching decisions begin. Understanding every directive is non-negotiable for a staff engineer.
Freshness directives:
| Directive | Meaning |
|---|---|
max-age=3600 | Cache is fresh for 3600 seconds from the time the response was generated |
s-maxage=86400 | Override max-age for shared caches only (CDN gets 24h, browser gets max-age) |
no-cache | Store in cache but always revalidate with origin before serving |
no-store | Never store this response anywhere — not browser, not CDN, not proxy |
immutable | Never revalidate, even on user-initiated reload (safe for fingerprinted assets) |
Revalidation directives:
| Directive | Meaning |
|---|---|
must-revalidate | Once stale, the cache MUST NOT serve the response without revalidating — no implicit grace period |
stale-while-revalidate=60 | Serve the stale response immediately, revalidate in the background within 60 seconds |
stale-if-error=300 | If origin is down, serve stale content for up to 300 seconds rather than returning an error |
Scope directives:
| Directive | Meaning |
|---|---|
public | Any cache (including shared CDN caches) may store this response |
private | Only the browser may cache this — CDNs and proxies must not store it |
Conditional Requests: ETags and 304s
When a cached response goes stale (max-age exceeded), the cache does not necessarily discard it. Instead, it sends a conditional request to check if the content actually changed.
The flow:
First request:
GET /api/products
→ 200 OK
→ ETag: "a1b2c3d4"
→ Cache-Control: max-age=60
After 60 seconds (stale):
GET /api/products
If-None-Match: "a1b2c3d4"
→ 304 Not Modified (no body — 0 bytes transferred)
→ Cache extends freshness for another max-age period
If content changed:
GET /api/products
If-None-Match: "a1b2c3d4"
→ 200 OK (new body)
→ ETag: "e5f6g7h8"
The 304 response is small (just headers, no body), saving potentially megabytes of redundant transfer. For API responses and HTML pages that change infrequently, conditional requests provide near-cache-hit performance while guaranteeing freshness.
Weak vs Strong ETags: A weak ETag (W/"a1b2c3") indicates semantic equivalence — the content may differ in insignificant ways (whitespace, formatting). A strong ETag guarantees byte-for-byte identity. Most servers generate strong ETags from content hashes.
The Vary Header: Cache Key Expansion
The Vary header fundamentally changes how caches key responses. Without Vary, the cache key is simply the URL. With Vary: Accept-Encoding, the cache stores separate entries for each encoding:
/app.js (Accept-Encoding: gzip) → cached gzip response
/app.js (Accept-Encoding: br) → cached brotli response
/app.js (Accept-Encoding: identity) → cached uncompressed response
This is necessary and correct for content encoding. But Vary: Cookie is a disaster for shared caches because every authenticated user sends a different cookie, creating a unique cache entry per user. The CDN hit rate drops to near zero.
The fix: use Cache-Control: private for user-specific content instead of Vary: Cookie. Private tells the CDN not to cache it at all, which is the correct behavior.
Caching Strategies by Content Type
Fingerprinted static assets (JS, CSS, images with hashes in filenames):
Cache-Control: public, max-age=31536000, immutable
The filename changes on every deploy (app.a1b2c3.js → app.d4e5f6.js), so the old URL is never requested again. Setting max-age to one year and immutable means the browser never revalidates — not even on hard refresh.
HTML pages (semi-dynamic):
Cache-Control: public, s-maxage=300, stale-while-revalidate=60
ETag: "page-hash-abc123"
The CDN caches for 5 minutes. After that, stale-while-revalidate lets it serve the stale page instantly while fetching fresh content in the background. Users never wait for origin.
API responses (user-specific):
Cache-Control: private, no-cache
ETag: "user-data-xyz789"
Only the browser caches it. Every request revalidates with If-None-Match, but most responses are 304s (saving bandwidth). The CDN never stores it.
Sensitive data (payment, health records):
Cache-Control: no-store
Nothing is stored anywhere. Every request is a full round trip to origin.
Cache Invalidation: The Hard Problem
Phil Karlton's quote — "there are only two hard things in computer science: cache invalidation and naming things" — exists because invalidation is genuinely difficult.
TTL-based expiration: The simplest approach. Set max-age=300 and accept up to 5 minutes of staleness. No active invalidation needed. Works well for content where slight staleness is acceptable.
Purge by URL: CDNs expose purge APIs (POST /purge?url=...). Works for individual resources but fails for complex sites where one data change affects dozens of URLs.
Surrogate keys / Cache tags: The scalable solution. Tag every response with metadata (Surrogate-Key: product-123 category-shoes). When product 123 changes, purge all responses tagged product-123. Fastly and Cloudflare both support this pattern.
# Fastly: purge by surrogate key
curl -X POST https://api.fastly.com/service/{id}/purge/product-123 \
-H "Fastly-Key: {token}"
# Cloudflare: purge by cache tag
curl -X POST https://api.cloudflare.com/client/v4/zones/{zone}/purge_cache \
-d '{"tags": ["product-123"]}'
Event-driven invalidation: Connect cache purges to data change events. When a CMS publish event fires, a webhook triggers CDN purges for affected pages. This closes the gap between data change and cache update from minutes to seconds.
Measuring Cache Effectiveness
Cache performance is measured by hit rate — the percentage of requests served from cache without reaching origin.
| Hit Rate | Assessment | Action |
|---|---|---|
| < 50% | Poor | Check Cache-Control headers, Vary usage, and query parameter handling |
| 50-80% | Moderate | Look for cacheable content served with no-store or short TTLs |
| 80-95% | Good | Typical for well-configured sites with mixed static/dynamic content |
| > 95% | Excellent | Common for static-heavy sites (media, documentation, blogs) |
The Age response header reveals how long a response has been in the cache. If every response shows Age: 0, the cache is never being used. CDN analytics dashboards (Cloudflare, Fastly, CloudFront) provide detailed hit/miss breakdowns by content type.
Key Points
- •Cache-Control: no-cache does NOT mean 'do not cache.' It means 'cache the response but always revalidate with the origin before using it.' The directive that prevents storing is no-store.
- •stale-while-revalidate allows the cache to serve a stale response immediately while fetching a fresh copy in the background — eliminating latency spikes during revalidation.
- •The immutable directive tells the browser to never revalidate a resource, even on a hard reload. This is safe for fingerprinted assets like /app.a1b2c3.js and eliminates wasted conditional requests.
- •s-maxage overrides max-age for shared caches (CDNs and proxies) without affecting the browser cache. This separates the CDN TTL from the browser TTL.
- •The Vary header is a cache key modifier. Setting Vary: Accept-Encoding means the cache stores separate gzip and brotli versions. Setting Vary: Cookie effectively disables shared caching because every user's cookie differs.
Key Components
| Component | Role |
|---|---|
| Cache-Control Header | The primary directive that tells browsers and intermediaries whether to cache a response, for how long, and under what conditions it must be revalidated |
| ETag / If-None-Match | An opaque validator (hash or version string) the server attaches to a response — clients send it back in conditional requests to check if the resource changed |
| Vary Header | Tells caches which request headers affect the response, so the cache stores separate entries for different Accept-Encoding, Accept-Language, or Cookie values |
| 304 Not Modified | The server response code indicating the cached copy is still valid — no body is sent, saving bandwidth while still confirming freshness |
| Surrogate Keys / Cache Tags | CDN-specific metadata that enables surgical cache invalidation — purge all responses tagged 'product-123' instead of guessing URLs |
When to Use
Every HTTP response should have explicit Cache-Control headers. Static assets with fingerprinted filenames get immutable + long max-age. API responses that are user-specific get private, no-cache or no-store. Semi-dynamic pages (product listings, blog posts) benefit from s-maxage with stale-while-revalidate on the CDN.
Tool Comparison
| Tool | Type | Best For | Scale |
|---|---|---|---|
| Varnish | Open Source | High-performance HTTP reverse proxy cache with VCL scripting for custom cache logic — handles millions of req/sec in front of origin servers | Medium-Enterprise |
| Nginx proxy_cache | Open Source | Built-in caching for Nginx reverse proxy, simple configuration, good for single-origin setups without complex invalidation needs | Small-Enterprise |
| Cloudflare | Commercial | Global CDN with automatic caching, Tiered Cache to reduce origin load, and Cache Rules for fine-grained control without touching origin headers | Small-Enterprise |
| Squid | Open Source | Forward proxy cache for corporate networks and ISPs — caches outbound traffic to reduce bandwidth consumption | Small-Enterprise |
Debug Checklist
- Inspect cache headers: curl -I https://example.com shows Cache-Control, ETag, Last-Modified, Vary, and Age headers — Age > 0 means the response was served from a cache.
- Check browser cache behavior: Open DevTools Network tab, look for '(disk cache)' or '(memory cache)' in the Size column. Gray 304 responses indicate successful revalidation.
- Test CDN caching: Compare the CF-Cache-Status (Cloudflare) or X-Cache (AWS CloudFront) response header — HIT means CDN served it, MISS means it went to origin.
- Verify Vary header impact: Request the same URL with different Accept-Encoding values and check if the CDN returns different cached copies or always misses.
- Debug conditional requests: Send a manual request with If-None-Match: <etag> and verify the server returns 304 with no body, not a full 200 response.
Common Mistakes
- Confusing no-cache with no-store. Setting no-cache still caches the response — it just forces revalidation. Sensitive data (account pages, payment info) needs no-store.
- Serving static assets with short max-age instead of using fingerprinted filenames with immutable. Every deployment causes millions of conditional requests that all return 304.
- Setting Vary: Cookie on CDN-cached content, which creates a unique cache entry per user and makes the CDN effectively useless — a 0% hit rate.
- Not setting Cache-Control at all. Without explicit directives, browser heuristics apply — typically caching for 10% of the age since Last-Modified, which is unpredictable.
- Purging CDN caches by URL pattern without realizing that query parameters, Vary headers, and content negotiation create multiple cache entries per URL.
Real World Usage
- •Netflix uses aggressive caching at their Open Connect CDN appliances — a single popular title is cached on ISP-local servers, so the origin serves it once and the cache handles millions of streams.
- •GitHub serves static assets (avatars, CSS, JS) with Cache-Control: max-age=31536000, immutable and fingerprinted URLs — the browser never revalidates these resources.
- •Shopify uses Cloudflare with surrogate keys to invalidate product page caches when merchants update inventory. Changing a price purges only that product's cache entries, not the entire storefront.
- •Google Search serves its homepage with Cache-Control: private, max-age=0, meaning every request hits the origin — because the page is personalized and latency-sensitive enough that Google's infrastructure handles it anyway.