WebSocket & Real-Time Communication

Why It Exists

HTTP makes the client ask for everything. Every single interaction starts with a request. That works fine for loading pages, but it falls apart for real-time features: notifications, chat, live dashboards, collaborative editing, multiplayer games. The server has data right now and needs to push it immediately.

Polling wastes bandwidth and adds noticeable latency. Long-polling ties up server threads waiting around. WebSocket takes a different approach. After a single HTTP Upgrade handshake, the result is a persistent, full-duplex TCP connection. Both sides can send frames whenever they want. The framing overhead is 2-14 bytes per frame versus hundreds of bytes in HTTP headers. That difference matters a lot when sending thousands of small messages per second.

How It Works

Protocol Comparison

Protocol	Direction	Connection	Overhead	Browser Support	Use Case
HTTP Polling	Client-pull	New connection per poll	High (headers every request)	Universal	Simple, infrequent updates
Long-Polling	Client-pull (held)	Held open, re-established	Medium	Universal	Fallback when WS unavailable
SSE	Server-push	Persistent HTTP/2 stream	Low (text/event-stream)	All modern browsers	Notifications, feeds, dashboards
WebSocket	Bidirectional	Persistent TCP	Minimal (2-14 byte frame)	All modern browsers	Chat, gaming, collaboration

A quick opinion: SSE is underrated. If the data flow is one-directional (server to client), SSE is simpler to operate, works through HTTP/2 without special proxy config, and handles reconnection automatically. Reach for WebSocket when bidirectional communication is actually needed.

WebSocket Lifecycle

HTTP Upgrade Handshake. The client sends a GET with Upgrade: websocket and a Sec-WebSocket-Key. The server responds with 101 Switching Protocols and a Sec-WebSocket-Accept header (a SHA-1 hash proving it read the key). After this, the TCP connection switches from HTTP framing to WebSocket framing. The HTTP path is over.
Data Transfer. Both sides exchange frames: text frames (UTF-8), binary frames, ping/pong for keepalive, and close frames. Framing is lightweight. A small message adds only 2 bytes of overhead. Messages can also be fragmented across multiple frames to stream large payloads without buffering the whole thing in memory.
Keepalive. The server sends ping frames at a regular interval, typically every 30-60 seconds. The client must respond with pong. No pong within the timeout? The server closes the connection. This is how silently-dropped TCP connections from NAT timeouts, mobile network switches, or client crashes get caught. Without this, ghost connections accumulate and metrics become unreliable.
Close Handshake. Either side sends a close frame with a status code (1000 = normal, 1001 = going away, 1008 = policy violation). The other side responds with its own close frame, then both tear down the TCP socket. Clean and orderly.

The Connection Routing Problem

This is where most teams hit a wall when scaling past a handful of servers.

The core challenge: an event can arrive at any server, but the target client is connected to one specific pod. A notification for user X hits the ingestion API, but user X's WebSocket lives on Pod 47 out of 200 pods. How does it get there?

Solution: Redis Connection Registry + Pub/Sub

When a client connects, the WebSocket pod writes {user_id -> pod_id} into Redis with a TTL matching the keepalive interval
When an event targets user X, the producer looks up user_id -> pod_id in Redis. O(1) lookup.
The producer publishes the event to a Redis Pub/Sub channel for that specific pod (e.g., ws:pod:47)
Pod 47 picks up the event and pushes it down user X's WebSocket from its local connection map

This provides O(1) lookup + O(1) delivery. No broadcast waste. I have seen teams skip this step and just fan out every event to every pod, which technically works but burns CPU and bandwidth at O(N) per event. It becomes painful fast past 20-30 pods.

Production Considerations

Load Balancing for Persistent Connections

WebSocket connections are long-lived, and that breaks a fundamental assumption of round-robin HTTP load balancing. Adding a new pod during scale-out means it gets zero existing connections while old pods stay overloaded. The balancer looks like it is distributing evenly, but the actual connection distribution is wildly skewed.

Use L4 (TCP) load balancing for WebSocket traffic. L7 HTTP load balancers often buffer the Upgrade request or enforce HTTP timeouts that kill idle connections. AWS NLB, HAProxy in TCP mode, or Envoy with upgrade_configs all handle this correctly.
Least-connections algorithm. This routes new connections to whichever pod has the fewest active connections, so things balance out naturally over time. Round-robin is the wrong choice here.
Connection limits per pod. Set a hard max (e.g., 50K per pod). When a pod hits the cap, it fails its health check so the load balancer stops sending new connections. Without this, a single pod will accumulate connections until it runs out of memory.

Graceful Connection Draining

During a rolling deployment, killing a pod drops all its WebSocket connections at once. If every client reconnects simultaneously, the remaining pods see a connection spike (thundering herd) and can cascade-fail. I have watched this take down a production cluster.

Drain protocol:

Mark the pod as draining (stop accepting new connections via health check)
Send a custom RECONNECT frame to connected clients with a random delay hint (0-30s)
Clients disconnect and reconnect to a new pod at a staggered time
Wait for the drain timeout (e.g., 60s), then terminate remaining connections
Pod shuts down

In Kubernetes, set terminationGracePeriodSeconds to match the drain timeout. Use a preStop hook to trigger the drain logic before the pod receives SIGTERM. This is one of those things that seems optional until it causes an outage at 2 AM.

Scaling to 100M Concurrent Connections

At notification-system scale (100M concurrent connections), the architecture looks like this:

2,000-4,000 WebSocket pods at 25K-50K connections each
Redis Cluster for the connection registry (100M keys at ~100 bytes each = ~10 GB, which is manageable)
Sharded Redis Pub/Sub. A single Redis Pub/Sub instance tops out around 500K messages/sec. Shard by pod ID hash to spread the load.
Kernel tuning: bump net.core.somaxconn, fs.file-max, net.ipv4.tcp_tw_reuse. Every connection uses a file descriptor, and the kernel default of 1024 is laughably low for this.
Memory budget: each idle WebSocket connection eats ~10-50 KB (buffers, TLS state, application state). At 50K connections per pod, that's 500 MB to 2.5 GB just on connection overhead before the application does anything useful.

Most teams will never hit this scale. But knowing the ceiling matters because it shows where the architecture starts to buckle and what needs investment ahead of time.

Failure Scenarios

Scenario 1: NAT/Proxy Timeout Causes Silent Connection Drops. Corporate firewalls and cloud NAT gateways silently kill idle TCP connections after 60-300 seconds. Without application-level keepalive, the server thinks the connection is alive while packets are being blackholed. The connection registry says the user is online, but events sent to them vanish. Active-user metrics quietly inflate with ghost connections. Detection: monitor the ratio of pong responses to ping sends and alert when the pong rate drops below 95%. Track connections_cleaned_up_by_timeout as a separate metric. Recovery: run server-side pings every 30 seconds (below most NAT timeouts). On timeout, remove the connection from both the local map and the Redis registry immediately. Client-side, implement reconnection with exponential backoff (1s, 2s, 4s, max 30s) plus jitter to avoid creating a mini thundering herd.

Scenario 2: Redis Registry Failure Breaks Connection Routing. The Redis cluster used for connection routing goes through a failover. During the 10-30 second failover window, new connections cannot be registered and event routing lookups fail. Events queue up but cannot be delivered. After failover, the new Redis primary has an empty or stale dataset. All existing connections look unroutable. Detection: monitor redis_connection_registry_errors and event delivery failure rate. Alert when delivery drops below 99%. Recovery: each WebSocket pod keeps a local connection map as the source of truth. After Redis comes back, pods re-register all their active connections in a background sweep (rate-limit this to avoid hammering the fresh Redis instance). During the outage, fall back to broadcasting events to all pods. It is O(N) and wasteful, but it keeps events flowing. For extra safety, set up dual-write to a secondary Redis cluster so failover is instant.

Scenario 3: Thundering Herd After Pod Crash. A WebSocket pod handling 50K connections dies unexpectedly (OOM, node failure). All 50K clients detect the disconnect and try to reconnect at once. The load balancer distributes these across remaining pods, creating a 50K-connection spike. Pods that absorb too many reconnections may OOM themselves, creating a cascading failure. Detection: monitor connections_per_second and alert when it exceeds 5x the normal rate. Track per-pod connection count variance. Recovery: the real fix is client-side reconnection with jitter: reconnect_delay = min(base * 2^attempt + random(0, 1000ms), 30s). Server-side, add connection rate limiting per pod (e.g., max 1,000 new connections/second). In Kubernetes, use pod disruption budgets to prevent multiple pods from going down at the same time during voluntary operations.

Capacity Planning

Connection density formula: pods_needed = total_connections / connections_per_pod. The right connections-per-pod number depends on the message rate, message size, and available memory. Profile this with realistic traffic before picking a number.

Scale Tier	Concurrent Connections	Pods (50K/pod)	Redis Registry	Bandwidth	Reference
Startup	10K	1-2	Single Redis	10 Mbps	Early-stage app
Mid-scale	500K	10-15	Redis Sentinel	500 Mbps	Chat/collab platform
Large-scale	10M	200-400	Redis Cluster (3 shards)	10 Gbps	Gaming/trading platform
Hyper-scale	100M+	2,000-4,000	Redis Cluster (10+ shards)	100+ Gbps	Notification system

Key thresholds to watch: The kernel file descriptor limit defaults to 1024. Set it to 1M+ for WebSocket servers or the limit will be hit fast. Each TLS connection adds ~30 KB of memory overhead, so at 50K connections, that is 1.5 GB just for TLS state. Redis Pub/Sub maxes out around 500K messages/sec per shard, so partition by pod ID hash. And keep an eye on TIME_WAIT socket accumulation during connection churn. Enable tcp_tw_reuse to reclaim ports faster, otherwise ephemeral port exhaustion will occur under load.

Tool	Type	Best For	Scale
Socket.IO	Open Source	Auto-reconnect, fallback transports, rooms/namespaces	Small-Medium
ws (Node.js)	Open Source	Minimal WebSocket server, high performance, no abstraction	Medium-Enterprise
Envoy/NGINX	Open Source	L4/L7 proxy with WebSocket support, connection draining	Medium-Enterprise
AWS API Gateway WebSocket	Managed	Serverless WebSocket with Lambda integration, connection management	Small-Large

Why It Exists

How It Works

Protocol Comparison

Protocol	Direction	Connection	Overhead	Browser Support	Use Case
HTTP Polling	Client-pull	New connection per poll	High (headers every request)	Universal	Simple, infrequent updates
Long-Polling	Client-pull (held)	Held open, re-established	Medium	Universal	Fallback when WS unavailable
SSE	Server-push	Persistent HTTP/2 stream	Low (text/event-stream)	All modern browsers	Notifications, feeds, dashboards
WebSocket	Bidirectional	Persistent TCP	Minimal (2-14 byte frame)	All modern browsers	Chat, gaming, collaboration

WebSocket Lifecycle

HTTP Upgrade Handshake. The client sends a GET with Upgrade: websocket and a Sec-WebSocket-Key. The server responds with 101 Switching Protocols and a Sec-WebSocket-Accept header (a SHA-1 hash proving it read the key). After this, the TCP connection switches from HTTP framing to WebSocket framing. The HTTP path is over.
Data Transfer. Both sides exchange frames: text frames (UTF-8), binary frames, ping/pong for keepalive, and close frames. Framing is lightweight. A small message adds only 2 bytes of overhead. Messages can also be fragmented across multiple frames to stream large payloads without buffering the whole thing in memory.
Keepalive. The server sends ping frames at a regular interval, typically every 30-60 seconds. The client must respond with pong. No pong within the timeout? The server closes the connection. This is how silently-dropped TCP connections from NAT timeouts, mobile network switches, or client crashes get caught. Without this, ghost connections accumulate and metrics become unreliable.
Close Handshake. Either side sends a close frame with a status code (1000 = normal, 1001 = going away, 1008 = policy violation). The other side responds with its own close frame, then both tear down the TCP socket. Clean and orderly.

The Connection Routing Problem

This is where most teams hit a wall when scaling past a handful of servers.

Solution: Redis Connection Registry + Pub/Sub

When a client connects, the WebSocket pod writes {user_id -> pod_id} into Redis with a TTL matching the keepalive interval
When an event targets user X, the producer looks up user_id -> pod_id in Redis. O(1) lookup.
The producer publishes the event to a Redis Pub/Sub channel for that specific pod (e.g., ws:pod:47)
Pod 47 picks up the event and pushes it down user X's WebSocket from its local connection map

Production Considerations

Load Balancing for Persistent Connections

Use L4 (TCP) load balancing for WebSocket traffic. L7 HTTP load balancers often buffer the Upgrade request or enforce HTTP timeouts that kill idle connections. AWS NLB, HAProxy in TCP mode, or Envoy with upgrade_configs all handle this correctly.
Least-connections algorithm. This routes new connections to whichever pod has the fewest active connections, so things balance out naturally over time. Round-robin is the wrong choice here.
Connection limits per pod. Set a hard max (e.g., 50K per pod). When a pod hits the cap, it fails its health check so the load balancer stops sending new connections. Without this, a single pod will accumulate connections until it runs out of memory.

Graceful Connection Draining

Drain protocol:

Mark the pod as draining (stop accepting new connections via health check)
Send a custom RECONNECT frame to connected clients with a random delay hint (0-30s)
Clients disconnect and reconnect to a new pod at a staggered time
Wait for the drain timeout (e.g., 60s), then terminate remaining connections
Pod shuts down

Scaling to 100M Concurrent Connections

At notification-system scale (100M concurrent connections), the architecture looks like this:

2,000-4,000 WebSocket pods at 25K-50K connections each
Redis Cluster for the connection registry (100M keys at ~100 bytes each = ~10 GB, which is manageable)
Sharded Redis Pub/Sub. A single Redis Pub/Sub instance tops out around 500K messages/sec. Shard by pod ID hash to spread the load.
Kernel tuning: bump net.core.somaxconn, fs.file-max, net.ipv4.tcp_tw_reuse. Every connection uses a file descriptor, and the kernel default of 1024 is laughably low for this.
Memory budget: each idle WebSocket connection eats ~10-50 KB (buffers, TLS state, application state). At 50K connections per pod, that's 500 MB to 2.5 GB just on connection overhead before the application does anything useful.

Most teams will never hit this scale. But knowing the ceiling matters because it shows where the architecture starts to buckle and what needs investment ahead of time.

Failure Scenarios

Capacity Planning

Scale Tier	Concurrent Connections	Pods (50K/pod)	Redis Registry	Bandwidth	Reference
Startup	10K	1-2	Single Redis	10 Mbps	Early-stage app
Mid-scale	500K	10-15	Redis Sentinel	500 Mbps	Chat/collab platform
Large-scale	10M	200-400	Redis Cluster (3 shards)	10 Gbps	Gaming/trading platform
Hyper-scale	100M+	2,000-4,000	Redis Cluster (10+ shards)	100+ Gbps	Notification system

Architecture Diagram

Why It Exists

How It Works

Protocol Comparison

WebSocket Lifecycle

The Connection Routing Problem

Production Considerations

Load Balancing for Persistent Connections

Graceful Connection Draining

Scaling to 100M Concurrent Connections

Failure Scenarios

Capacity Planning

Key Points

Tool Comparison

Common Mistakes

Related Topics

WebSocket & Real-Time Communication

Architecture Diagram

Why It Exists

How It Works

Protocol Comparison

WebSocket Lifecycle

The Connection Routing Problem

Production Considerations

Load Balancing for Persistent Connections

Graceful Connection Draining

Scaling to 100M Concurrent Connections

Failure Scenarios

Capacity Planning

Key Points

Tool Comparison

Common Mistakes

Related Topics