WebSocket Gateway — Infrastructure | CrackingWalnuts

Why It Exists

A load balancer like AWS ALB terminates TLS and routes the initial HTTP connection to a backend. For regular HTTP requests, that is the whole story. But WebSocket connections are different. After the HTTP Upgrade handshake, the protocol switches from request/response to persistent bidirectional frames. At that point, ALB stops understanding what is flowing through the connection. It becomes a byte-forwarding proxy, passing raw TCP payload between client and server without parsing, inspecting, or routing based on content.

Something needs to sit on the other side of ALB and actually manage these connections. That something is the WebSocket gateway. It tracks which clients are connected, routes connections to the correct backend server, and manages the connection lifecycle (keepalive, timeouts, draining). How deeply it inspects traffic depends on the pattern: a proxy gateway forwards WebSocket frames as raw bytes after routing at connection time, while a terminating gateway decodes every frame and makes per-message routing decisions. Most collaboration systems use the proxy pattern. Both patterns exist because without a gateway layer, sync servers or application servers are directly exposed to connection management, TLS overhead, and scaling concerns that have nothing to do with application logic.

How It Works

The Production Stack

A typical production architecture for WebSocket-heavy applications has four layers, each with a distinct responsibility:

Layer	Component	Responsibility
Edge	Client (browser/app)	Opens WebSocket connection
Ingress	AWS ALB	TLS termination, routes HTTP Upgrade to gateway instances
Protocol	WebSocket Gateway	Manages connections, decodes frames, understands application messages
Application	Sync Servers	Application logic (documents, chat, game state)
Coordination	Redis / Database	Cross-node state synchronization

Step 1 — Client Initiates Connection

The browser wants to connect to wss://collab.example.com/ws. It sends a standard HTTPS request:

GET /ws HTTP/1.1
Host: collab.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Authorization: Bearer <JWT>

This is still normal HTTP at this point. Nothing special has happened yet.

Step 2 — TLS Termination at ALB

The connection between client and ALB is encrypted with TLS (wss://). ALB decrypts the traffic, which means everything after ALB travels as plaintext HTTP within the VPC. ALB can now inspect the HTTP headers: it can see the Upgrade: websocket header, the Authorization bearer token, the Host, and the path. It uses these to decide which target group (which set of gateway instances) to route the request to.

Step 3 — ALB Forwards the Upgrade Request

ALB forwards the HTTP request to a WebSocket gateway instance. At this point the request is still HTTP. ALB selected a gateway instance using its routing rules (typically least-connections for WebSocket traffic). The gateway has not done anything yet.

Step 4 — WebSocket Upgrade at the Gateway

The gateway performs the protocol upgrade. It responds:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

At this moment, HTTP ends and WebSocket begins. The connection becomes persistent and bidirectional. The client and gateway can now exchange WebSocket frames in both directions.

The physical network path is still Client ⇄ ALB ⇄ Gateway. ALB remains in the middle. Every single WebSocket frame, in both directions, passes through ALB for the lifetime of the connection. But ALB is no longer interpreting what it sees. It is forwarding bytes.

Step 5 — Connection Lifecycle Management

The WebSocket connection stays open for a long time, potentially hours or days. The gateway manages the entire lifecycle:

Connection tracking. The gateway maintains a registry mapping socket connections to users: { socket_123: userA, socket_124: userB }. This is where the system knows which user is connected and on which gateway instance.
Ping/pong keepalive. The gateway sends ping frames every 30 seconds. If a client does not respond with pong within the timeout, the gateway closes the connection and removes it from the registry. This catches silent TCP drops from NAT timeouts, mobile network switches, and client crashes.
Timeouts and reconnections. The gateway enforces idle timeouts, handles graceful close handshakes, and processes reconnection attempts (validating tokens again on reconnect).

ALB does none of this. It does not manage WebSocket sessions. It does not track connections. It forwards packets.

Step 6 — Application Message Routing

This is where the gateway earns its name.

A client sends a message:

{ "type": "join", "docId": "doc-abc-123" }

This becomes a WebSocket frame:

FIN=1, opcode=0x1 (text), payload={"type":"join","docId":"doc-abc-123"}

The frame travels: Client → ALB → Gateway.

What ALB sees: After TLS decryption, ALB sees raw bytes: 81 2A 7B 22 74 79 70 65 .... It treats this as TCP payload. It receives bytes and forwards bytes. No parsing. No inspection. No routing decisions.

What the gateway does: The gateway decodes the WebSocket frame, strips the framing header, and reads the application payload. It now understands the message:

if message.type == "join":
    doc_id = message.doc_id
    server = consistent_hash(doc_id)  # → sync-server-2
    route(message, server)

This is what "understanding application-level protocol" means. The gateway understands the meaning of messages, not just bytes. It can make routing decisions based on content. ALB cannot do this.

ALB vs WebSocket Gateway

Aspect	ALB	WebSocket Gateway
TLS termination	Yes	No (ALB already did it)
WebSocket support	Passes through after HTTP Upgrade	Actively manages the connection
What it sees after upgrade	Raw bytes (`81 2A 7B 22...`)	Decoded application messages (`{"type":"join","docId":"doc-123"}`)
Routing	Routes the connection to a target group	Routes individual messages to specific sync servers based on content
Auth	Limited (Cognito for HTTP, not per-message)	Full — validates JWT at handshake, can authorize per-document access
Connection awareness	Health checks and draining	Tracks every connection per user, per document
Per-message logic	None — transparent TCP pipe	Inspect, throttle, reject, or route individual frames
Connection limits	Per-target-group, coarse	Per-instance, per-document, per-user — fine-grained

The short version: ALB handles the first 200ms (TLS + routing the HTTP Upgrade). The gateway handles the next 2 hours (connection management, message routing, protocol intelligence).

Gateway Routing to Sync Servers

Once the gateway understands the application message, it routes to the correct sync server. With three sync servers:

docId = "doc-abc-123"
hash("doc-abc-123") % 3 → sync-server-2

Messages flow: Client → ALB → Gateway → Sync Server 2

The sync server handles application logic: CRDT document state, chat room management, game state updates. When the sync server produces a response or broadcast, the path reverses: Sync Server → Gateway → ALB → Client.

Gateway Patterns: Proxy vs Terminate

Everything above described the termination pattern, where the gateway decodes every WebSocket frame and understands the application message. But that is only one of two ways to build a WebSocket gateway. The choice between them is one of the most important architectural decisions for a real-time system.

Pattern A — Transparent Proxy

Client ⇄ WebSocket ⇄ Gateway ⇄ WebSocket ⇄ Sync Server

The gateway does not parse WebSocket frames. It forwards bytes. The WebSocket connection effectively spans the entire path from client to sync server, with the gateway acting as a transparent relay.

Routing happens once, during the initial HTTP Upgrade handshake. The client includes the routing key in the connection URL:

wss://collab.example.com/ws?docId=doc-abc-123

The gateway reads docId from the query parameter, hashes it to select a sync server, and establishes the upstream WebSocket connection. After that, every frame flows straight through. The gateway copies bytes in both directions without inspecting them.

Component	Responsibility
Gateway	TLS termination, connection-time routing, byte forwarding
Sync server	WebSocket protocol handling, application logic

Work per message: copy bytes, forward bytes. Very cheap. The gateway is doing the same work as a TCP proxy.

Pattern B — WebSocket Termination

Client ⇄ WebSocket ⇄ Gateway ──→ internal protocol ──→ Sync Server

The gateway terminates the WebSocket connection. It decodes every frame, parses the application payload, and makes per-message routing decisions. The sync server does not speak WebSocket at all — it receives messages via an internal protocol (HTTP, gRPC, TCP, or a message queue).

# Gateway logic for every incoming message
frame = decode_websocket_frame(raw_bytes)
message = json.loads(frame.payload)

if message["type"] == "join":
    server = consistent_hash(message["docId"])
    forward(message, server)
elif message["type"] == "edit":
    if not rate_limit.allow(user_id):
        send_error(client, "rate limited")
        return
    server = consistent_hash(message["docId"])
    forward(message, server)

Work per message: decode WebSocket frame, parse JSON/binary payload, apply routing logic, potentially validate, rate limit, or filter, then forward. Much more CPU per message than proxy mode.

Component	Responsibility
Gateway	WebSocket termination, frame decoding, message parsing, per-message routing, rate limiting, filtering
Sync server	Application logic only (receives structured messages via internal protocol)

Performance and Scaling

The scaling difference between the two patterns is significant:

Aspect	Proxy Mode	Termination Mode
Gateway work scales with	O(connections)	O(messages)
CPU per message	Negligible (byte copy)	Measurable (decode + parse + route)
Bottleneck	File descriptors, memory	CPU, message throughput
Gateway complexity	Simple (stateless relay)	Complex (protocol-aware, stateful logic)

Message volume is always much larger than connection volume. A single WebSocket connection might carry thousands of messages per minute. In proxy mode, the gateway does not care. In termination mode, every one of those messages costs CPU.

Think of it like a mail system. Proxy mode is a mail truck: it picks up packages and delivers them without opening them. Termination mode is a sorting center: it opens every package, reads the contents, decides where to send it. Opening every package is expensive.

When to Use Which

Question	If yes	If no
Does the gateway need to inspect message content?	Terminate	Proxy
Is per-message routing needed (different messages to different servers)?	Terminate	Proxy
Is protocol translation needed (WebSocket to gRPC, HTTP, etc.)?	Terminate	Proxy
Is routing determined entirely at connection time?	Proxy	—
Is per-message rate limiting or filtering needed at the gateway?	Terminate	Proxy

What Real Systems Use

Most collaboration and real-time systems use proxy mode:

Figma — routes by file ID at connection time, then proxies
Notion — routes by block/page at connection time
Linear — routes by workspace at connection time
Hocuspocus/Yjs setups — routes by document ID at connection time

The reason is consistent: routing happens once during the handshake. Once the client is connected to the right sync server, every subsequent message belongs to the same document. There is no need to inspect frames.

Termination mode is used by systems that need per-message intelligence:

API gateways with WebSocket support (AWS API Gateway WebSocket dispatches each message to a Lambda)
Trading systems (inspect every order, validate, route to matching engine)
Real-time analytics pipelines (parse events, filter, fan out to multiple consumers)
Protocol translation layers (WebSocket clients talking to gRPC backends)

Recommended Architecture for Collaborative Editing

For Hocuspocus/Yjs systems, the proxy pattern is the standard choice:

Client
  │
  ▼
ALB (TLS termination)
  │
  ▼
WebSocket Gateway (proxy mode, routes by docId at connection time)
  │
  ▼
Hocuspocus Servers (WebSocket + CRDT logic)
  │
  ▼
Redis (cross-instance sync via @hocuspocus/extension-redis)

The gateway reads docId from the connection URL, consistent-hashes it to a Hocuspocus instance, and forwards all subsequent frames without inspection. Hocuspocus handles the WebSocket protocol, Yjs sync, and document state. The gateway's only ongoing job is keeping the connection alive and forwarding bytes.

Production Considerations

Why This Layering Exists

Separating the gateway from sync servers enables independent scaling. The gateway tier is stateless (it holds open connections but no application state). The sync tier is stateful (it holds Yjs documents, chat rooms, or game state in memory). These have completely different scaling characteristics:

Gateway scaling is driven by connection count. Each connection costs 10-50 KB of memory (buffers, TLS state from ALB passthrough, per-connection metadata). At 50K connections per instance, a gateway pod uses 500 MB to 2.5 GB just for connection overhead.
Sync server scaling is driven by compute and memory for application state. A Yjs document can be 15-100 KB in memory. CPU usage depends on edit frequency, not connection count.

At 500K concurrent connections, a typical setup runs 10-15 gateway instances and only 3-5 sync servers, because the bottleneck is different for each tier.

Full End-to-End Flow

Connection creation:

Client
  ├── TLS handshake ──→ ALB
  ├── HTTP Upgrade ────→ ALB forwards to Gateway
  ├── 101 Switching ───← Gateway accepts upgrade
  └── Persistent WebSocket connection established

Client-to-server message flow:

Client sends WebSocket frame
  → ALB forwards bytes (no parsing)
    → Gateway decodes frame, reads payload
      → Routes to correct sync server (hash on docId)
        → Sync server processes application logic

Server-to-client response flow:

Sync server produces update
  → Gateway receives, encodes as WebSocket frame
    → ALB forwards bytes
      → Client receives update

Broadcast flow (one edit reaching all other editors):

Sync server produces update for docId
  → Publishes to Redis Pub/Sub channel for docId
    → All gateways subscribed to that channel receive update
      → Each gateway sends to its locally connected clients for that document

Failure Scenarios

Scenario 1: Gateway Pod Crashes. All WebSocket connections on that pod drop immediately. Clients detect the disconnect and attempt to reconnect. If all clients reconnect at once, the remaining gateway pods see a connection spike (thundering herd). Detection: monitor connections_per_second and alert when it exceeds 5x normal rate. Recovery: client-side reconnection with exponential backoff plus jitter (delay = min(base * 2^attempt + random(0, 1000ms), 30s)). Server-side, enforce connection rate limiting per pod (max 1,000 new connections/second). Prevention: implement graceful drain, send a custom RECONNECT frame with a random delay hint before shutting down, and set Kubernetes terminationGracePeriodSeconds to match the drain timeout.

Scenario 2: ALB Idle Timeout Kills WebSocket Connections. ALB has a default idle timeout of 60 seconds. If no data flows across a WebSocket connection for 60 seconds, ALB terminates it. The gateway and client both see an unexpected disconnect. Detection: monitor connection duration distribution. If most connections die around the 60-second mark, the idle timeout is the cause. Recovery: set ALB idle timeout to 3600 seconds (the maximum) for WebSocket target groups. On the application side, send ping frames every 30 seconds so connections are never idle. Do both.

Scenario 3: Gateway Cannot Reach Sync Servers. The network between gateway and sync server tier is disrupted (security group change, sync server pod crash, DNS failure). The gateway has open client connections but cannot route messages to any backend. Detection: monitor gateway-to-sync-server error rate and latency. Alert when error rate exceeds 1%. Recovery: the gateway should buffer messages briefly (5-10 seconds) and retry. If the sync server remains unreachable, send an error frame to the client and let the client fall back to offline mode. The CRDT on the client side will hold all edits and sync when the path is restored.

Tool	Type	Best For	Scale
NGINX	Open Source	WebSocket proxying with upstream routing, proven at scale	Medium-Enterprise
Envoy	Open Source	Dynamic routing, gRPC + WebSocket, observability built in	Large-Enterprise
AWS API Gateway WebSocket	Managed	Serverless WebSocket, Lambda per-message dispatch, zero ops	Small-Large
Hocuspocus	Open Source	Yjs/CRDT-aware WebSocket server, collaborative editing	Small-Medium

Why It Exists

How It Works

The Production Stack

A typical production architecture for WebSocket-heavy applications has four layers, each with a distinct responsibility:

Layer	Component	Responsibility
Edge	Client (browser/app)	Opens WebSocket connection
Ingress	AWS ALB	TLS termination, routes HTTP Upgrade to gateway instances
Protocol	WebSocket Gateway	Manages connections, decodes frames, understands application messages
Application	Sync Servers	Application logic (documents, chat, game state)
Coordination	Redis / Database	Cross-node state synchronization

Step 1 — Client Initiates Connection

The browser wants to connect to wss://collab.example.com/ws. It sends a standard HTTPS request:

GET /ws HTTP/1.1
Host: collab.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Authorization: Bearer <JWT>

This is still normal HTTP at this point. Nothing special has happened yet.

Step 2 — TLS Termination at ALB

Step 3 — ALB Forwards the Upgrade Request

Step 4 — WebSocket Upgrade at the Gateway

The gateway performs the protocol upgrade. It responds:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

At this moment, HTTP ends and WebSocket begins. The connection becomes persistent and bidirectional. The client and gateway can now exchange WebSocket frames in both directions.

Step 5 — Connection Lifecycle Management

The WebSocket connection stays open for a long time, potentially hours or days. The gateway manages the entire lifecycle:

Connection tracking. The gateway maintains a registry mapping socket connections to users: { socket_123: userA, socket_124: userB }. This is where the system knows which user is connected and on which gateway instance.
Ping/pong keepalive. The gateway sends ping frames every 30 seconds. If a client does not respond with pong within the timeout, the gateway closes the connection and removes it from the registry. This catches silent TCP drops from NAT timeouts, mobile network switches, and client crashes.
Timeouts and reconnections. The gateway enforces idle timeouts, handles graceful close handshakes, and processes reconnection attempts (validating tokens again on reconnect).

ALB does none of this. It does not manage WebSocket sessions. It does not track connections. It forwards packets.

Step 6 — Application Message Routing

This is where the gateway earns its name.

A client sends a message:

{ "type": "join", "docId": "doc-abc-123" }

This becomes a WebSocket frame:

FIN=1, opcode=0x1 (text), payload={"type":"join","docId":"doc-abc-123"}

The frame travels: Client → ALB → Gateway.

What the gateway does: The gateway decodes the WebSocket frame, strips the framing header, and reads the application payload. It now understands the message:

if message.type == "join":
    doc_id = message.doc_id
    server = consistent_hash(doc_id)  # → sync-server-2
    route(message, server)

This is what "understanding application-level protocol" means. The gateway understands the meaning of messages, not just bytes. It can make routing decisions based on content. ALB cannot do this.

ALB vs WebSocket Gateway

Aspect	ALB	WebSocket Gateway
TLS termination	Yes	No (ALB already did it)
WebSocket support	Passes through after HTTP Upgrade	Actively manages the connection
What it sees after upgrade	Raw bytes (`81 2A 7B 22...`)	Decoded application messages (`{"type":"join","docId":"doc-123"}`)
Routing	Routes the connection to a target group	Routes individual messages to specific sync servers based on content
Auth	Limited (Cognito for HTTP, not per-message)	Full — validates JWT at handshake, can authorize per-document access
Connection awareness	Health checks and draining	Tracks every connection per user, per document
Per-message logic	None — transparent TCP pipe	Inspect, throttle, reject, or route individual frames
Connection limits	Per-target-group, coarse	Per-instance, per-document, per-user — fine-grained

The short version: ALB handles the first 200ms (TLS + routing the HTTP Upgrade). The gateway handles the next 2 hours (connection management, message routing, protocol intelligence).

Gateway Routing to Sync Servers

Once the gateway understands the application message, it routes to the correct sync server. With three sync servers:

docId = "doc-abc-123"
hash("doc-abc-123") % 3 → sync-server-2

Messages flow: Client → ALB → Gateway → Sync Server 2

Gateway Patterns: Proxy vs Terminate

Pattern A — Transparent Proxy

Client ⇄ WebSocket ⇄ Gateway ⇄ WebSocket ⇄ Sync Server

The gateway does not parse WebSocket frames. It forwards bytes. The WebSocket connection effectively spans the entire path from client to sync server, with the gateway acting as a transparent relay.

Routing happens once, during the initial HTTP Upgrade handshake. The client includes the routing key in the connection URL:

wss://collab.example.com/ws?docId=doc-abc-123

Component	Responsibility
Gateway	TLS termination, connection-time routing, byte forwarding
Sync server	WebSocket protocol handling, application logic

Work per message: copy bytes, forward bytes. Very cheap. The gateway is doing the same work as a TCP proxy.

Pattern B — WebSocket Termination

Client ⇄ WebSocket ⇄ Gateway ──→ internal protocol ──→ Sync Server

# Gateway logic for every incoming message
frame = decode_websocket_frame(raw_bytes)
message = json.loads(frame.payload)

if message["type"] == "join":
    server = consistent_hash(message["docId"])
    forward(message, server)
elif message["type"] == "edit":
    if not rate_limit.allow(user_id):
        send_error(client, "rate limited")
        return
    server = consistent_hash(message["docId"])
    forward(message, server)

Work per message: decode WebSocket frame, parse JSON/binary payload, apply routing logic, potentially validate, rate limit, or filter, then forward. Much more CPU per message than proxy mode.

Component	Responsibility
Gateway	WebSocket termination, frame decoding, message parsing, per-message routing, rate limiting, filtering
Sync server	Application logic only (receives structured messages via internal protocol)

Performance and Scaling

The scaling difference between the two patterns is significant:

Aspect	Proxy Mode	Termination Mode
Gateway work scales with	O(connections)	O(messages)
CPU per message	Negligible (byte copy)	Measurable (decode + parse + route)
Bottleneck	File descriptors, memory	CPU, message throughput
Gateway complexity	Simple (stateless relay)	Complex (protocol-aware, stateful logic)

When to Use Which

Question	If yes	If no
Does the gateway need to inspect message content?	Terminate	Proxy
Is per-message routing needed (different messages to different servers)?	Terminate	Proxy
Is protocol translation needed (WebSocket to gRPC, HTTP, etc.)?	Terminate	Proxy
Is routing determined entirely at connection time?	Proxy	—
Is per-message rate limiting or filtering needed at the gateway?	Terminate	Proxy

What Real Systems Use

Most collaboration and real-time systems use proxy mode:

Figma — routes by file ID at connection time, then proxies
Notion — routes by block/page at connection time
Linear — routes by workspace at connection time
Hocuspocus/Yjs setups — routes by document ID at connection time

Termination mode is used by systems that need per-message intelligence:

API gateways with WebSocket support (AWS API Gateway WebSocket dispatches each message to a Lambda)
Trading systems (inspect every order, validate, route to matching engine)
Real-time analytics pipelines (parse events, filter, fan out to multiple consumers)
Protocol translation layers (WebSocket clients talking to gRPC backends)

Recommended Architecture for Collaborative Editing

For Hocuspocus/Yjs systems, the proxy pattern is the standard choice:

Client
  │
  ▼
ALB (TLS termination)
  │
  ▼
WebSocket Gateway (proxy mode, routes by docId at connection time)
  │
  ▼
Hocuspocus Servers (WebSocket + CRDT logic)
  │
  ▼
Redis (cross-instance sync via @hocuspocus/extension-redis)

Production Considerations

Why This Layering Exists

Gateway scaling is driven by connection count. Each connection costs 10-50 KB of memory (buffers, TLS state from ALB passthrough, per-connection metadata). At 50K connections per instance, a gateway pod uses 500 MB to 2.5 GB just for connection overhead.
Sync server scaling is driven by compute and memory for application state. A Yjs document can be 15-100 KB in memory. CPU usage depends on edit frequency, not connection count.

At 500K concurrent connections, a typical setup runs 10-15 gateway instances and only 3-5 sync servers, because the bottleneck is different for each tier.

Full End-to-End Flow

Connection creation:

Client
  ├── TLS handshake ──→ ALB
  ├── HTTP Upgrade ────→ ALB forwards to Gateway
  ├── 101 Switching ───← Gateway accepts upgrade
  └── Persistent WebSocket connection established

Client-to-server message flow:

Client sends WebSocket frame
  → ALB forwards bytes (no parsing)
    → Gateway decodes frame, reads payload
      → Routes to correct sync server (hash on docId)
        → Sync server processes application logic

Server-to-client response flow:

Sync server produces update
  → Gateway receives, encodes as WebSocket frame
    → ALB forwards bytes
      → Client receives update

Broadcast flow (one edit reaching all other editors):

Sync server produces update for docId
  → Publishes to Redis Pub/Sub channel for docId
    → All gateways subscribed to that channel receive update
      → Each gateway sends to its locally connected clients for that document

Architecture Diagram

Why It Exists

How It Works

The Production Stack

Step 1 — Client Initiates Connection

Step 2 — TLS Termination at ALB

Step 3 — ALB Forwards the Upgrade Request

Step 4 — WebSocket Upgrade at the Gateway

Step 5 — Connection Lifecycle Management

Step 6 — Application Message Routing

ALB vs WebSocket Gateway

Gateway Routing to Sync Servers

Gateway Patterns: Proxy vs Terminate

Pattern A — Transparent Proxy

Pattern B — WebSocket Termination

Performance and Scaling

When to Use Which

What Real Systems Use

Recommended Architecture for Collaborative Editing

Production Considerations

Why This Layering Exists

Full End-to-End Flow

Failure Scenarios

Key Points

Tool Comparison

Common Mistakes

Related Topics

Architecture Diagram

Why It Exists

How It Works

The Production Stack

Step 1 — Client Initiates Connection

Step 2 — TLS Termination at ALB

Step 3 — ALB Forwards the Upgrade Request

Step 4 — WebSocket Upgrade at the Gateway

Step 5 — Connection Lifecycle Management

Step 6 — Application Message Routing

ALB vs WebSocket Gateway

Gateway Routing to Sync Servers

Gateway Patterns: Proxy vs Terminate

Pattern A — Transparent Proxy

Pattern B — WebSocket Termination

Performance and Scaling

When to Use Which

What Real Systems Use

Recommended Architecture for Collaborative Editing

Production Considerations

Why This Layering Exists

Full End-to-End Flow

Failure Scenarios

Key Points

Tool Comparison

Common Mistakes

Related Topics