WebSocket Protocol
WebSocket upgrades an HTTP connection to full-duplex bidirectional communication, enabling real-time messaging with minimal frame overhead.
The Problem
HTTP's request-response model requires the client to initiate every interaction. How can the server push data to the client instantly, without polling, while maintaining a persistent, low-overhead connection?
Mental Model
Like upgrading from walkie-talkie (HTTP request-response) to a phone call (both sides talk whenever)
Architecture Diagram
How It Works
WebSocket solves a fundamental limitation of HTTP: the server can't talk to the client unless the client asks first. With WebSocket, either side can send a message at any time over a persistent, full-duplex TCP connection.
The protocol has two phases: a handshake (which is HTTP) and data transfer (which is not HTTP at all).
The Upgrade Handshake
Every WebSocket connection starts as a regular HTTP/1.1 request with special headers:
GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Origin: http://example.com
Sec-WebSocket-Protocol: chat, superchat
The server responds with 101 Switching Protocols:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Sec-WebSocket-Protocol: chat
The Sec-WebSocket-Accept value is computed by concatenating the client's Sec-WebSocket-Key with a magic GUID (258EAFA5-E914-47DA-95CA-C5AB0DC85B11), SHA-1 hashing the result, and base64 encoding it. This isn't for security — it prevents misconfigured HTTP servers from accidentally accepting WebSocket connections.
After this handshake, the HTTP connection transforms into a WebSocket connection. The TCP socket stays open, but the protocol spoken over it is no longer HTTP.
Frame Format — Lightweight by Design
WebSocket frames are intentionally minimal. The smallest possible frame is 2 bytes of overhead (compared to HTTP headers that are often 500+ bytes):
0 1 2 3
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len | Extended payload length |
|I|S|S|S| (4) |A| (7) | (16/64) |
|N|V|V|V| |S| | (if payload len==126/127) |
| |1|2|3| |K| | |
+-+-+-+-+-------+-+-------------+-------------------------------+
| Masking-key (if MASK set) |
+-------------------------------+-------------------------------+
| Payload Data |
+---------------------------------------------------------------+
Opcodes define the frame type:
0x1— Text frame (UTF-8)0x2— Binary frame0x8— Close0x9— Ping0xA— Pong0x0— Continuation (for fragmented messages)
The FIN bit indicates whether this is the final frame of a message. Large messages can be fragmented across multiple frames, allowing control frames (ping/pong) to be interleaved.
Masking is required for all client-to-server frames. Each frame includes a 4-byte mask key, and the payload is XORed with it. This exists to prevent cache-poisoning attacks where a malicious client tricks a transparent proxy into caching attacker-controlled content. Server-to-client frames are NOT masked.
Building a Real-World WebSocket Server
Here's what a production WebSocket implementation actually looks like in Node.js:
const WebSocket = require('ws');
const server = new WebSocket.Server({ port: 8080 });
// Connection management
const clients = new Map();
server.on('connection', (ws, req) => {
const clientId = generateId();
clients.set(clientId, ws);
// Heartbeat — detect dead connections
ws.isAlive = true;
ws.on('pong', () => { ws.isAlive = true; });
ws.on('message', (data) => {
const msg = JSON.parse(data);
switch (msg.type) {
case 'chat':
// Broadcast to all connected clients
broadcast({ type: 'chat', from: clientId, text: msg.text });
break;
case 'subscribe':
// Subscribe to a topic/room
subscribeToRoom(clientId, msg.room);
break;
}
});
ws.on('close', (code, reason) => {
clients.delete(clientId);
console.log(`Client ${clientId} disconnected: ${code} ${reason}`);
});
ws.on('error', (err) => {
console.error(`Client ${clientId} error:`, err);
clients.delete(clientId);
});
});
// Heartbeat interval — kill dead connections
const heartbeat = setInterval(() => {
server.clients.forEach((ws) => {
if (!ws.isAlive) return ws.terminate();
ws.isAlive = false;
ws.ping();
});
}, 30000);
function broadcast(msg) {
const data = JSON.stringify(msg);
clients.forEach((ws) => {
if (ws.readyState === WebSocket.OPEN) {
ws.send(data);
}
});
}
Scaling WebSocket — The Hard Problem
Scaling HTTP is straightforward: add more stateless servers behind a load balancer. WebSocket connections are stateful — a client is connected to a specific server, and messages must reach that server. This changes everything.
The Fan-Out Problem
When user A sends a message, it needs to reach users B, C, and D who may be connected to different servers. This requires a message broker:
Server 1 (Users A, B) ←→ Redis Pub/Sub ←→ Server 2 (Users C, D)
↕
Server 3 (Users E, F)
Redis Pub/Sub is the most common solution. Each server subscribes to relevant channels. When a message arrives on Server 1, it publishes to Redis, and all other servers receive it and forward to their connected clients.
Connection Limits
A single server can handle 100K-1M WebSocket connections depending on message rate and payload size. The bottleneck is usually not CPU or memory but the event loop's ability to process messages. Key optimizations:
- Use binary frames (protobuf) instead of JSON to reduce parsing overhead
- Implement message batching for high-frequency updates
- Use connection compression (permessage-deflate extension) for text-heavy payloads
- Offload heavy processing to worker threads
Reconnection Strategy
WebSocket has no built-in reconnection. When the connection drops (network change, server restart, proxy timeout), the application must handle it:
class ReconnectingWebSocket {
constructor(url) {
this.url = url;
this.retryCount = 0;
this.maxRetries = 10;
this.connect();
}
connect() {
this.ws = new WebSocket(this.url);
this.ws.onopen = () => {
this.retryCount = 0;
// Re-subscribe to channels, sync missed messages
this.reconcileState();
};
this.ws.onclose = (event) => {
if (event.code !== 1000 && this.retryCount < this.maxRetries) {
// Exponential backoff with jitter
const delay = Math.min(1000 * Math.pow(2, this.retryCount), 30000);
const jitter = delay * 0.2 * Math.random();
setTimeout(() => this.connect(), delay + jitter);
this.retryCount++;
}
};
}
reconcileState() {
// Send last known message ID to server
// Server replays missed messages
this.ws.send(JSON.stringify({
type: 'sync',
lastMessageId: this.lastMessageId
}));
}
}
The reconcileState step is critical and often overlooked. When a client reconnects, it has missed messages. The server needs to know the last message the client received and replay everything since. Without this, users see gaps in their chat history or stale dashboard data.
WebSocket vs the Alternatives
| Feature | WebSocket | SSE | Long Polling |
|---|---|---|---|
| Direction | Bidirectional | Server → Client only | Client-initiated |
| Protocol | Binary frames over TCP | HTTP text stream | HTTP request-response |
| Overhead | 2-14 bytes/frame | ~50 bytes/event | Full HTTP headers each time |
| Reconnection | Manual | Built-in (EventSource) | N/A (new request each time) |
| Browser support | Universal | Universal (except old IE) | Universal |
| Proxy-friendly | Needs Upgrade support | Works everywhere | Works everywhere |
Choose WebSocket when: the use case requires bidirectional communication, high message frequency (>1/sec), or binary data. Think chat, gaming, collaborative editing, live trading.
Choose SSE when: only server-to-client push is needed with low-to-medium frequency. Think notifications, live feeds, dashboards. SSE is simpler to deploy and has built-in reconnection.
Choose long polling as a last resort — when WebSocket is blocked by corporate firewalls and SSE doesn't meet the requirements.
Key Points
- •WebSocket provides true full-duplex communication — both client and server can send messages independently at any time
- •The protocol starts as HTTP and upgrades, making it firewall-friendly and compatible with existing infrastructure
- •Client-to-server frames MUST be masked (XOR with a random key) to prevent cache poisoning attacks on proxies
- •WebSocket has no built-in reconnection — the application must implement retry logic, exponential backoff, and state reconciliation
- •A single WebSocket connection can carry thousands of messages per second with minimal overhead (2-14 bytes per frame)
Key Components
| Component | Role |
|---|---|
| HTTP Upgrade Handshake | Elevates a standard HTTP connection to a WebSocket connection via Upgrade and Connection headers |
| Frame Format | Lightweight binary framing with opcode, masking bit, payload length, and optional mask key |
| Ping/Pong Heartbeat | Keep-alive mechanism to detect dead connections — either side can send a Ping and must receive a Pong |
| Close Handshake | Graceful shutdown with close frames carrying status codes, ensuring both sides agree the connection is done |
| Message Fragmentation | Large messages can be split across multiple frames, allowing interleaving with control frames |
When to Use
Use WebSocket for true bidirectional, low-latency communication: chat, live dashboards, collaborative editing, gaming, financial data streams. Avoid it for simple server-to-client push (use SSE instead) or request-response patterns (use HTTP).
Tool Comparison
| Tool | Type | Best For | Scale |
|---|---|---|---|
| Socket.IO | Open Source | WebSocket with automatic fallback to long-polling, rooms, and namespaces | Small to medium real-time apps |
| ws (Node.js) | Open Source | Lightweight, spec-compliant WebSocket implementation with no abstractions | High-performance Node.js servers |
| Gorilla WebSocket | Open Source | Production Go WebSocket server with compression and connection management | High-concurrency Go services |
| SignalR | Open Source | .NET real-time framework with automatic transport negotiation and hub abstraction | Enterprise .NET applications |
Debug Checklist
- Inspect the HTTP upgrade request in browser DevTools — look for 101 Switching Protocols response
- Check WebSocket frames in Chrome DevTools Network tab → WS sub-tab for message content and timing
- Verify ping/pong frames are being exchanged — use wscat to connect and monitor
- Check for proxy/load balancer WebSocket support — Nginx needs proxy_set_header Upgrade and Connection
- Monitor close frame status codes — 1000 (normal), 1001 (going away), 1006 (abnormal, no close frame received)
Common Mistakes
- Not implementing heartbeat/ping-pong — without it, dead connections go undetected for hours
- Assuming WebSocket connections survive network changes — they don't, unlike QUIC/HTTP/3
- Sending JSON when binary protobuf would halve the bandwidth — WebSocket supports both text and binary frames
- Not handling reconnection logic — the protocol has no auto-reconnect, the application must build it
- Running WebSocket behind a load balancer without sticky sessions — connections can't be seamlessly moved between servers
Real World Usage
- •Slack uses WebSocket for real-time message delivery to connected clients
- •Trading platforms (Binance, Coinbase) stream live market data over WebSocket
- •Google Docs uses WebSocket for real-time collaborative editing and cursor tracking
- •Figma streams design operations over WebSocket for multi-user collaboration
- •Online gaming platforms use WebSocket for low-latency game state synchronization