WebSocket Gateway
Architecture Diagram
Why It Exists
A load balancer like AWS ALB terminates TLS and routes the initial HTTP connection to a backend. For regular HTTP requests, that is the whole story. But WebSocket connections are different. After the HTTP Upgrade handshake, the protocol switches from request/response to persistent bidirectional frames. At that point, ALB stops understanding what is flowing through the connection. It becomes a byte-forwarding proxy, passing raw TCP payload between client and server without parsing, inspecting, or routing based on content.
Something needs to sit on the other side of ALB and actually manage these connections. That something is the WebSocket gateway. It tracks which clients are connected, routes connections to the correct backend server, and manages the connection lifecycle (keepalive, timeouts, draining). How deeply it inspects traffic depends on the pattern: a proxy gateway forwards WebSocket frames as raw bytes after routing at connection time, while a terminating gateway decodes every frame and makes per-message routing decisions. Most collaboration systems use the proxy pattern. Both patterns exist because without a gateway layer, sync servers or application servers are directly exposed to connection management, TLS overhead, and scaling concerns that have nothing to do with application logic.
How It Works
The Production Stack
A typical production architecture for WebSocket-heavy applications has four layers, each with a distinct responsibility:
| Layer | Component | Responsibility |
|---|---|---|
| Edge | Client (browser/app) | Opens WebSocket connection |
| Ingress | AWS ALB | TLS termination, routes HTTP Upgrade to gateway instances |
| Protocol | WebSocket Gateway | Manages connections, decodes frames, understands application messages |
| Application | Sync Servers | Application logic (documents, chat, game state) |
| Coordination | Redis / Database | Cross-node state synchronization |
Step 1 — Client Initiates Connection
The browser wants to connect to wss://collab.example.com/ws. It sends a standard HTTPS request:
GET /ws HTTP/1.1
Host: collab.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Authorization: Bearer <JWT>
This is still normal HTTP at this point. Nothing special has happened yet.
Step 2 — TLS Termination at ALB
The connection between client and ALB is encrypted with TLS (wss://). ALB decrypts the traffic, which means everything after ALB travels as plaintext HTTP within the VPC. ALB can now inspect the HTTP headers: it can see the Upgrade: websocket header, the Authorization bearer token, the Host, and the path. It uses these to decide which target group (which set of gateway instances) to route the request to.
Step 3 — ALB Forwards the Upgrade Request
ALB forwards the HTTP request to a WebSocket gateway instance. At this point the request is still HTTP. ALB selected a gateway instance using its routing rules (typically least-connections for WebSocket traffic). The gateway has not done anything yet.
Step 4 — WebSocket Upgrade at the Gateway
The gateway performs the protocol upgrade. It responds:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
At this moment, HTTP ends and WebSocket begins. The connection becomes persistent and bidirectional. The client and gateway can now exchange WebSocket frames in both directions.
The physical network path is still Client ⇄ ALB ⇄ Gateway. ALB remains in the middle. Every single WebSocket frame, in both directions, passes through ALB for the lifetime of the connection. But ALB is no longer interpreting what it sees. It is forwarding bytes.
Step 5 — Connection Lifecycle Management
The WebSocket connection stays open for a long time, potentially hours or days. The gateway manages the entire lifecycle:
- Connection tracking. The gateway maintains a registry mapping socket connections to users:
{ socket_123: userA, socket_124: userB }. This is where the system knows which user is connected and on which gateway instance. - Ping/pong keepalive. The gateway sends ping frames every 30 seconds. If a client does not respond with pong within the timeout, the gateway closes the connection and removes it from the registry. This catches silent TCP drops from NAT timeouts, mobile network switches, and client crashes.
- Timeouts and reconnections. The gateway enforces idle timeouts, handles graceful close handshakes, and processes reconnection attempts (validating tokens again on reconnect).
ALB does none of this. It does not manage WebSocket sessions. It does not track connections. It forwards packets.
Step 6 — Application Message Routing
This is where the gateway earns its name.
A client sends a message:
{ "type": "join", "docId": "doc-abc-123" }
This becomes a WebSocket frame:
FIN=1, opcode=0x1 (text), payload={"type":"join","docId":"doc-abc-123"}
The frame travels: Client → ALB → Gateway.
What ALB sees: After TLS decryption, ALB sees raw bytes: 81 2A 7B 22 74 79 70 65 .... It treats this as TCP payload. It receives bytes and forwards bytes. No parsing. No inspection. No routing decisions.
What the gateway does: The gateway decodes the WebSocket frame, strips the framing header, and reads the application payload. It now understands the message:
if message.type == "join":
doc_id = message.doc_id
server = consistent_hash(doc_id) # → sync-server-2
route(message, server)
This is what "understanding application-level protocol" means. The gateway understands the meaning of messages, not just bytes. It can make routing decisions based on content. ALB cannot do this.
ALB vs WebSocket Gateway
| Aspect | ALB | WebSocket Gateway |
|---|---|---|
| TLS termination | Yes | No (ALB already did it) |
| WebSocket support | Passes through after HTTP Upgrade | Actively manages the connection |
| What it sees after upgrade | Raw bytes (81 2A 7B 22...) | Decoded application messages ({"type":"join","docId":"doc-123"}) |
| Routing | Routes the connection to a target group | Routes individual messages to specific sync servers based on content |
| Auth | Limited (Cognito for HTTP, not per-message) | Full — validates JWT at handshake, can authorize per-document access |
| Connection awareness | Health checks and draining | Tracks every connection per user, per document |
| Per-message logic | None — transparent TCP pipe | Inspect, throttle, reject, or route individual frames |
| Connection limits | Per-target-group, coarse | Per-instance, per-document, per-user — fine-grained |
The short version: ALB handles the first 200ms (TLS + routing the HTTP Upgrade). The gateway handles the next 2 hours (connection management, message routing, protocol intelligence).
Gateway Routing to Sync Servers
Once the gateway understands the application message, it routes to the correct sync server. With three sync servers:
docId = "doc-abc-123"
hash("doc-abc-123") % 3 → sync-server-2
Messages flow: Client → ALB → Gateway → Sync Server 2
The sync server handles application logic: CRDT document state, chat room management, game state updates. When the sync server produces a response or broadcast, the path reverses: Sync Server → Gateway → ALB → Client.
Gateway Patterns: Proxy vs Terminate
Everything above described the termination pattern, where the gateway decodes every WebSocket frame and understands the application message. But that is only one of two ways to build a WebSocket gateway. The choice between them is one of the most important architectural decisions for a real-time system.
Pattern A — Transparent Proxy
Client ⇄ WebSocket ⇄ Gateway ⇄ WebSocket ⇄ Sync Server
The gateway does not parse WebSocket frames. It forwards bytes. The WebSocket connection effectively spans the entire path from client to sync server, with the gateway acting as a transparent relay.
Routing happens once, during the initial HTTP Upgrade handshake. The client includes the routing key in the connection URL:
wss://collab.example.com/ws?docId=doc-abc-123
The gateway reads docId from the query parameter, hashes it to select a sync server, and establishes the upstream WebSocket connection. After that, every frame flows straight through. The gateway copies bytes in both directions without inspecting them.
| Component | Responsibility |
|---|---|
| Gateway | TLS termination, connection-time routing, byte forwarding |
| Sync server | WebSocket protocol handling, application logic |
Work per message: copy bytes, forward bytes. Very cheap. The gateway is doing the same work as a TCP proxy.
Pattern B — WebSocket Termination
Client ⇄ WebSocket ⇄ Gateway ──→ internal protocol ──→ Sync Server
The gateway terminates the WebSocket connection. It decodes every frame, parses the application payload, and makes per-message routing decisions. The sync server does not speak WebSocket at all — it receives messages via an internal protocol (HTTP, gRPC, TCP, or a message queue).
# Gateway logic for every incoming message
frame = decode_websocket_frame(raw_bytes)
message = json.loads(frame.payload)
if message["type"] == "join":
server = consistent_hash(message["docId"])
forward(message, server)
elif message["type"] == "edit":
if not rate_limit.allow(user_id):
send_error(client, "rate limited")
return
server = consistent_hash(message["docId"])
forward(message, server)
Work per message: decode WebSocket frame, parse JSON/binary payload, apply routing logic, potentially validate, rate limit, or filter, then forward. Much more CPU per message than proxy mode.
| Component | Responsibility |
|---|---|
| Gateway | WebSocket termination, frame decoding, message parsing, per-message routing, rate limiting, filtering |
| Sync server | Application logic only (receives structured messages via internal protocol) |
Performance and Scaling
The scaling difference between the two patterns is significant:
| Aspect | Proxy Mode | Termination Mode |
|---|---|---|
| Gateway work scales with | O(connections) | O(messages) |
| CPU per message | Negligible (byte copy) | Measurable (decode + parse + route) |
| Bottleneck | File descriptors, memory | CPU, message throughput |
| Gateway complexity | Simple (stateless relay) | Complex (protocol-aware, stateful logic) |
Message volume is always much larger than connection volume. A single WebSocket connection might carry thousands of messages per minute. In proxy mode, the gateway does not care. In termination mode, every one of those messages costs CPU.
Think of it like a mail system. Proxy mode is a mail truck: it picks up packages and delivers them without opening them. Termination mode is a sorting center: it opens every package, reads the contents, decides where to send it. Opening every package is expensive.
When to Use Which
| Question | If yes | If no |
|---|---|---|
| Does the gateway need to inspect message content? | Terminate | Proxy |
| Is per-message routing needed (different messages to different servers)? | Terminate | Proxy |
| Is protocol translation needed (WebSocket to gRPC, HTTP, etc.)? | Terminate | Proxy |
| Is routing determined entirely at connection time? | Proxy | — |
| Is per-message rate limiting or filtering needed at the gateway? | Terminate | Proxy |
What Real Systems Use
Most collaboration and real-time systems use proxy mode:
- Figma — routes by file ID at connection time, then proxies
- Notion — routes by block/page at connection time
- Linear — routes by workspace at connection time
- Hocuspocus/Yjs setups — routes by document ID at connection time
The reason is consistent: routing happens once during the handshake. Once the client is connected to the right sync server, every subsequent message belongs to the same document. There is no need to inspect frames.
Termination mode is used by systems that need per-message intelligence:
- API gateways with WebSocket support (AWS API Gateway WebSocket dispatches each message to a Lambda)
- Trading systems (inspect every order, validate, route to matching engine)
- Real-time analytics pipelines (parse events, filter, fan out to multiple consumers)
- Protocol translation layers (WebSocket clients talking to gRPC backends)
Recommended Architecture for Collaborative Editing
For Hocuspocus/Yjs systems, the proxy pattern is the standard choice:
Client
│
▼
ALB (TLS termination)
│
▼
WebSocket Gateway (proxy mode, routes by docId at connection time)
│
▼
Hocuspocus Servers (WebSocket + CRDT logic)
│
▼
Redis (cross-instance sync via @hocuspocus/extension-redis)
The gateway reads docId from the connection URL, consistent-hashes it to a Hocuspocus instance, and forwards all subsequent frames without inspection. Hocuspocus handles the WebSocket protocol, Yjs sync, and document state. The gateway's only ongoing job is keeping the connection alive and forwarding bytes.
Production Considerations
Why This Layering Exists
Separating the gateway from sync servers enables independent scaling. The gateway tier is stateless (it holds open connections but no application state). The sync tier is stateful (it holds Yjs documents, chat rooms, or game state in memory). These have completely different scaling characteristics:
- Gateway scaling is driven by connection count. Each connection costs 10-50 KB of memory (buffers, TLS state from ALB passthrough, per-connection metadata). At 50K connections per instance, a gateway pod uses 500 MB to 2.5 GB just for connection overhead.
- Sync server scaling is driven by compute and memory for application state. A Yjs document can be 15-100 KB in memory. CPU usage depends on edit frequency, not connection count.
At 500K concurrent connections, a typical setup runs 10-15 gateway instances and only 3-5 sync servers, because the bottleneck is different for each tier.
Full End-to-End Flow
Connection creation:
Client
├── TLS handshake ──→ ALB
├── HTTP Upgrade ────→ ALB forwards to Gateway
├── 101 Switching ───← Gateway accepts upgrade
└── Persistent WebSocket connection established
Client-to-server message flow:
Client sends WebSocket frame
→ ALB forwards bytes (no parsing)
→ Gateway decodes frame, reads payload
→ Routes to correct sync server (hash on docId)
→ Sync server processes application logic
Server-to-client response flow:
Sync server produces update
→ Gateway receives, encodes as WebSocket frame
→ ALB forwards bytes
→ Client receives update
Broadcast flow (one edit reaching all other editors):
Sync server produces update for docId
→ Publishes to Redis Pub/Sub channel for docId
→ All gateways subscribed to that channel receive update
→ Each gateway sends to its locally connected clients for that document
Failure Scenarios
Scenario 1: Gateway Pod Crashes. All WebSocket connections on that pod drop immediately. Clients detect the disconnect and attempt to reconnect. If all clients reconnect at once, the remaining gateway pods see a connection spike (thundering herd). Detection: monitor connections_per_second and alert when it exceeds 5x normal rate. Recovery: client-side reconnection with exponential backoff plus jitter (delay = min(base * 2^attempt + random(0, 1000ms), 30s)). Server-side, enforce connection rate limiting per pod (max 1,000 new connections/second). Prevention: implement graceful drain, send a custom RECONNECT frame with a random delay hint before shutting down, and set Kubernetes terminationGracePeriodSeconds to match the drain timeout.
Scenario 2: ALB Idle Timeout Kills WebSocket Connections. ALB has a default idle timeout of 60 seconds. If no data flows across a WebSocket connection for 60 seconds, ALB terminates it. The gateway and client both see an unexpected disconnect. Detection: monitor connection duration distribution. If most connections die around the 60-second mark, the idle timeout is the cause. Recovery: set ALB idle timeout to 3600 seconds (the maximum) for WebSocket target groups. On the application side, send ping frames every 30 seconds so connections are never idle. Do both.
Scenario 3: Gateway Cannot Reach Sync Servers. The network between gateway and sync server tier is disrupted (security group change, sync server pod crash, DNS failure). The gateway has open client connections but cannot route messages to any backend. Detection: monitor gateway-to-sync-server error rate and latency. Alert when error rate exceeds 1%. Recovery: the gateway should buffer messages briefly (5-10 seconds) and retry. If the sync server remains unreachable, send an error frame to the client and let the client fall back to offline mode. The CRDT on the client side will hold all edits and sync when the path is restored.
Key Points
- •Sits between the load balancer and sync servers, terminating the WebSocket protocol and managing connection lifecycle
- •ALB terminates TLS and routes the initial HTTP Upgrade, but after the upgrade it becomes a byte-forwarding proxy. The gateway is where protocol intelligence lives
- •Two patterns: proxy mode (forward bytes, route at connection time) vs termination mode (decode every frame, route per message). Most collaboration systems use proxy mode
- •Maintains a connection registry mapping sockets to users, enabling targeted message delivery instead of broadcast
- •Proxy mode scales at O(connections), termination mode at O(messages). Message volume is always much larger than connection volume
Tool Comparison
| Tool | Type | Best For | Scale |
|---|---|---|---|
| NGINX | Open Source | WebSocket proxying with upstream routing, proven at scale | Medium-Enterprise |
| Envoy | Open Source | Dynamic routing, gRPC + WebSocket, observability built in | Large-Enterprise |
| AWS API Gateway WebSocket | Managed | Serverless WebSocket, Lambda per-message dispatch, zero ops | Small-Large |
| Hocuspocus | Open Source | Yjs/CRDT-aware WebSocket server, collaborative editing | Small-Medium |
Common Mistakes
- Confusing the ALB with the gateway. After the WebSocket upgrade, ALB is a TCP pipe. It cannot inspect frames, enforce per-message auth, or route based on payload content.
- Not separating gateway from sync servers. Combining them works at small scale, but connection management cannot scale independently from application logic.
- Skipping connection draining on gateway deploys. Killing a gateway pod drops all its WebSocket connections and triggers a thundering herd reconnect.
- Setting ALB idle timeout too low. ALB closes connections idle beyond its timeout (default 60s). Set it to 3600s for WebSocket and rely on application-level ping/pong instead.
- Building message routing without a connection registry. Without knowing which gateway holds which user, the system falls back to broadcasting every message to every gateway at O(N) cost.