HTTP/2 — Multiplexing Revolution
HTTP/2 multiplexes many requests over one TCP connection using binary framing, slashing latency and connection overhead.
The Problem
HTTP/1.1's one-request-at-a-time model forces browsers to open 6+ TCP connections per origin, wasting resources and limiting performance. How is this fixed without breaking existing HTTP semantics?
Mental Model
Like a multi-lane highway through a single tunnel - many conversations, one connection
Architecture Diagram
How It Works
HTTP/2 solves HTTP/1.1's fundamental performance problem: head-of-line blocking at the application layer. Instead of sending one request and waiting for its response before sending the next, HTTP/2 introduces a binary framing layer that breaks all HTTP messages into small frames, tags them with a stream ID, and interleaves them over a single TCP connection.
The key abstraction is the stream — a bidirectional sequence of frames between client and server. Each request-response pair gets its own stream. Streams are independent: if stream 3 is waiting for a database query, streams 5, 7, and 9 can still send and receive data. This is multiplexing, and it's the single biggest improvement over HTTP/1.1.
Binary Framing — The Foundation
Every HTTP/2 message is split into frames:
+-----------------------------------------------+
| Length (24 bits) |
+---------------+---------------+---------------+
| Type (8) | Flags (8) |
+-+-------------+---------------+
|R| Stream ID (31 bits) |
+=+=============================================+
| Frame Payload (variable) |
+-----------------------------------------------+
Frame types include:
- HEADERS — carries HTTP headers (compressed via HPACK)
- DATA — carries the response/request body
- SETTINGS — connection-level configuration
- WINDOW_UPDATE — flow control adjustments
- PUSH_PROMISE — server push announcements
- RST_STREAM — cancel a single stream without killing the connection
- GOAWAY — graceful connection shutdown
The binary format is both a blessing and a curse. It can't be debugged with telnet anymore, but it's dramatically more efficient to parse than HTTP/1.1's text format.
HPACK Header Compression
HTTP/1.1 headers are sent as plain text on every request. For a typical API call, headers can be 500-800 bytes — often larger than the actual payload. Multiply that by hundreds of requests per page load, and the waste is significant.
HPACK fixes this with two mechanisms:
- Static Table — 61 pre-defined header entries (
:method: GET,:status: 200, etc.) referenced by index - Dynamic Table — connection-specific table that grows as new headers are seen
# First request — full headers
:method: GET
:path: /api/users
authorization: Bearer eyJhbGciOiJIUzI1NiJ9...
accept: application/json
x-request-id: abc-123
# Second request — most headers indexed
:method: GET → index 2 (static table)
:path: /api/orders → literal (new path)
authorization: ... → index 62 (dynamic table, same token)
accept: ... → index 63 (dynamic table)
x-request-id: def-456 → literal (new value)
The result: 85-90% reduction in header bytes after the first few requests on a connection. This is particularly impactful for mobile clients on bandwidth-constrained networks.
Server Push — The Promise That Didn't Deliver
Server push allows the server to send resources before the client asks for them. When the server knows that /index.html will need /style.css and /app.js, it can push them immediately.
The idea was compelling. The reality was disappointing:
- Cache invalidation — the server doesn't know what the client already has cached. It pushes resources the client doesn't need.
- Bandwidth waste — pushed resources compete with resources the client actually requested.
- Complexity — getting push right requires intimate knowledge of the client's cache state.
Chrome removed server push support in 2022. The industry moved to 103 Early Hints instead — the server sends Link headers suggesting resources to preload, and the client decides whether to fetch them.
Stream Priorities and Flow Control
Not all resources are equal. The CSS needed for rendering is more important than a tracking pixel. HTTP/2 includes a priority system where clients can assign:
- Weight (1-256) — relative importance among sibling streams
- Dependency — parent-child relationships creating a priority tree
Stream 1 (HTML)
/ \
Stream 3 (CSS) Stream 5 (JS)
weight: 256 weight: 128
|
Stream 7 (Font)
weight: 200
In theory, the server uses this tree to allocate bandwidth. In practice, server implementations range from "fully respects priorities" (H2O) to "completely ignores them" (many Nginx versions). Chrome switched from the tree model to a simpler scheme, and HTTP/3 uses a different priority system entirely.
Flow control operates at two levels: per-stream and per-connection. Each has a window size (default 65,535 bytes) that the receiver adjusts with WINDOW_UPDATE frames. This prevents a fast sender from overwhelming a slow receiver and ensures one stream can't monopolize the connection.
The TCP Head-of-Line Blocking Irony
Here's the irony that led to HTTP/3: HTTP/2 solved application-layer head-of-line blocking but made TCP-layer blocking worse.
With HTTP/1.1's six connections, a packet loss on one connection only blocks requests on that connection. With HTTP/2's single connection, a single lost packet blocks all streams until TCP retransmits it. Under packet loss rates above 2%, HTTP/2 can actually perform worse than HTTP/1.1.
HTTP/1.1: 6 connections → packet loss affects 1/6 of requests
HTTP/2: 1 connection → packet loss affects ALL requests
This is why QUIC (HTTP/3) moved to UDP — it implements its own reliability per-stream, so a lost packet in one stream doesn't block others.
Migration Checklist
Moving to HTTP/2 is mostly transparent because the semantics (methods, headers, status codes) are identical to HTTP/1.1. Here's what to watch:
- Enable TLS — browsers require it for HTTP/2 (ALPN negotiation happens during the TLS handshake)
- Remove domain sharding — consolidate assets to a single origin to maximize multiplexing
- Remove concatenation hacks — sprite sheets and JS bundles can be split into individual files
- Stop inlining small resources — they can be served as separate streams now
- Verify the CDN/load balancer supports end-to-end HTTP/2 — many only terminate it at the edge
# Verify HTTP/2 support
curl -v --http2 https://example.com 2>&1 | grep "< HTTP/"
# Should show: < HTTP/2 200
# Check ALPN negotiation
openssl s_client -connect example.com:443 -alpn h2
HTTP/2 was a massive leap forward, but its reliance on TCP left one critical problem unsolved. That's where HTTP/3 and QUIC enter the picture.
Key Points
- •HTTP/2 multiplexes all requests over a single TCP connection, eliminating the need for domain sharding
- •HPACK header compression reduces header overhead by 85-90% compared to HTTP/1.1's repeated text headers
- •Server push sounded great in theory but is being removed from most browsers due to poor real-world performance
- •Stream prioritization lets clients hint which resources matter most, but server implementations vary wildly
- •TCP-level head-of-line blocking still exists — a single lost packet blocks ALL streams on the connection
Key Components
| Component | Role |
|---|---|
| Binary Framing Layer | Encodes all HTTP messages into binary frames with type, stream ID, and flags |
| Streams | Independent bidirectional sequences of frames within a single TCP connection |
| HPACK Compression | Stateful header compression using static/dynamic tables to eliminate redundant header bytes |
| Server Push | Allows the server to proactively send resources before the client requests them |
| Flow Control | Per-stream and per-connection windowing to prevent fast senders from overwhelming slow receivers |
When to Use
Use HTTP/2 for any public-facing web service. The multiplexing benefit is most pronounced when serving many small resources (APIs, SPAs with dozens of assets). For internal services, it shines with gRPC or high-fanout request patterns.
Tool Comparison
| Tool | Type | Best For | Scale |
|---|---|---|---|
| Nginx | Open Source | HTTP/2 termination and reverse proxying with battle-tested performance | Millions of concurrent connections |
| Envoy Proxy | Open Source | HTTP/2 in service mesh environments with advanced observability | Cloud-native microservice architectures |
| Cloudflare | Managed | Automatic HTTP/2 at the edge with zero server-side config | Global CDN scale |
| HAProxy | Open Source | High-performance HTTP/2 load balancing with fine-grained control | Enterprise load balancing |
Debug Checklist
- Verify HTTP/2 is negotiated — check for h2 in the ALPN extension using curl --http2 -v
- Inspect stream IDs in Wireshark or Chrome DevTools Network tab (Protocol column shows h2)
- Check if HPACK dynamic table is being populated — initial requests will be larger
- Look for GOAWAY frames indicating the server is shutting down the connection
- Monitor stream reset (RST_STREAM) frames — they indicate per-stream errors without killing the connection
Common Mistakes
- Still using domain sharding with HTTP/2 — this hurts performance by splitting the single-connection advantage
- Assuming server push will speed up page loads — in practice it often pushes resources the client already has cached
- Not enabling HTTP/2 on the backend — many teams only enable it at the CDN edge, missing internal benefits
- Ignoring stream priorities — unoptimized servers treat all streams equally, defeating the purpose
- Thinking HTTP/2 requires TLS — the spec allows plaintext (h2c), though browsers mandate TLS in practice
Real World Usage
- •Major websites saw 15-30% page load improvement by switching from HTTP/1.1 to HTTP/2
- •gRPC uses HTTP/2 as its transport layer, leveraging multiplexing for bidirectional streaming
- •CDNs like Cloudflare and Fastly terminate HTTP/2 from clients and can use it to communicate with origins
- •Internal microservice communication benefits from HTTP/2's multiplexing to reduce connection overhead
- •Mobile apps benefit significantly from HTTP/2's single connection, reducing battery drain from connection management