Hocuspocus
The Yjs WebSocket server that handles sync, auth, and persistence so teams do not have to
Use Cases
Architecture
Yjs handles the CRDT math, but it says nothing about servers. If two users need to collaborate on a document, their Yjs instances need to exchange updates somehow. One option is to build a raw WebSocket server that relays binary messages between clients. But then authentication is needed (who can access this document?), persistence (where do edits go when the server restarts?), reconnection logic (what happens when a client drops off for 30 seconds?), and multi-server coordination (what if there are 3 servers behind a load balancer?). That is a lot of infrastructure code that has nothing to do with the product.
Hocuspocus handles all of it. It is the server counterpart to Yjs the way Express is the server counterpart to HTTP.
How the Sync Protocol Works (Step by Step)
Here is exactly what happens when a client connects to edit a document. Every step matters, and understanding the sequence helps debug the inevitable "why is my document not syncing" issue.
Step 1. Client opens a WebSocket connection to wss://collab.example.com/document/abc-123. The document name is extracted from the URL path.
Step 2. Hocuspocus fires the onConnect hook. Custom auth logic runs here. Check the authorization header, verify the JWT, confirm the user has access to document abc-123. Return a promise that resolves to allow the connection or rejects to close it with a 403. Without this hook implemented, every connection is accepted. Fine for development, dangerous for production.
Step 3. Hocuspocus fires onAuthenticate. This is separate from onConnect because authentication can be more expensive (database lookup for permissions, checking team membership). User metadata attaches here: name, avatar URL, cursor color, role. This metadata flows into the Yjs awareness protocol so other clients can display it.
Step 4. Hocuspocus checks its in-memory document cache. If document abc-123 is already loaded (other clients are editing it), skip to step 6. If not, Hocuspocus fires onLoadDocument. The persistence adapter reads the latest Yjs binary snapshot from PostgreSQL:
SELECT yjs_state FROM documents WHERE name = 'abc-123';
The binary blob is applied to a fresh Y.Doc in memory. The document is now live on this server.
Step 5. If no persisted state exists (new document), Hocuspocus creates an empty Y.Doc. The onLoadDocument hook can optionally seed it with default content.
Step 6. Hocuspocus sends the client a Yjs sync step 1 message containing the server's state vector. The state vector is a compact map: {clientID_1: 42, clientID_2: 108} meaning "I have seen all operations from client 1 up to clock 42, and all from client 2 up to clock 108."
Step 7. The client compares the server's state vector against its own. It computes the diff: operations the server does not have. It sends those as a binary blob (sync step 2). The server applies them. Simultaneously, the server computes what the client is missing and sends those updates. Both sides are now synchronized. The whole handshake takes one round trip.
Step 8. From this point on, every keystroke produces a small Yjs update. The client sends it over the WebSocket. Hocuspocus broadcasts it to all other connected clients for that document. When editing slows down and the debounce timer fires (default: 2 seconds of inactivity), Hocuspocus calls onStoreDocument. The adapter serializes the Y.Doc and writes it to PostgreSQL.
When the last client disconnects, Hocuspocus fires a final onStoreDocument, then unloads the document from memory. Memory usage stays proportional to active documents, not total documents in the database.
Scaling to Multiple Servers
A single Hocuspocus server holds documents in memory. With 3 servers behind a load balancer, User A might connect to server 1 and User B to server 2, both editing document abc-123. Without coordination, their edits stay on their respective servers and never meet. Both users think they are editing alone.
There are two approaches to solve this.
Redis pub/sub (recommended for production). Hocuspocus publishes every Yjs update to a Redis pub/sub channel named after the document. All servers subscribe to channels for their active documents. When server 1 receives an update from User A, it publishes to Redis. Server 2 picks it up, applies it to its in-memory copy of the document, and forwards it to User B. The latency overhead is 1-2ms per update.
The setup is straightforward:
import { Server } from '@hocuspocus/server'
import { Redis } from '@hocuspocus/extension-redis'
const server = Server.configure({
extensions: [
new Redis({
host: 'redis.internal',
port: 6379,
}),
],
})
Every server publishes and subscribes. No leader election, no sharding. Redis handles the fan-out.
Sticky sessions (acceptable for small deployments). Configure the load balancer to route all connections for the same document to the same server. Hash the document ID and mod by server count. Simpler than Redis, and no additional infrastructure. The downside: if that server dies, all sessions for its documents drop. Clients reconnect to a different server and load from the persisted state, but any edits not yet flushed are lost.
In practice, Redis pub/sub is the right choice for anything beyond a prototype. Sticky sessions are a shortcut that stops working the moment failover or rolling deploys are needed.
Capacity numbers. A single Hocuspocus instance handles roughly 5,000 concurrent WebSocket connections comfortably on a 2-core, 4GB server. The bottleneck is memory, not CPU. Each active document consumes memory proportional to its size and edit history. A typical document with 10K characters of visible text and 50K internal CRDT items uses about 200KB of memory. With Redis pub/sub, a 3-node cluster handles approximately 15,000 concurrent connections across 3,000 active documents.
Persistence Strategies
The naive approach is to persist on every Yjs update. With 100 editors typing simultaneously on a document, that is hundreds of database writes per second. PostgreSQL will not be happy, burning IOPS for no reason because most of those writes are immediately superseded by the next keystroke.
Hocuspocus debounces by default. It waits for a configurable period of inactivity (default: 2,000ms) before calling onStoreDocument. During active editing, this batches rapid edits into one write every 2 seconds. For most applications, that means 1-2 database writes per document per second during active editing, and zero writes when nobody is typing.
Two storage strategies, each with clear tradeoffs:
Snapshot only. Store the full Yjs binary blob in a single column. Simple schema:
CREATE TABLE documents (
name TEXT PRIMARY KEY,
yjs_state BYTEA NOT NULL,
updated_at TIMESTAMPTZ DEFAULT now()
);
Recovery is fast: load one blob, apply to a fresh Y.Doc, done. The downside is no history. There is no way to recover the document state from 3 hours ago. No audit trail is possible. If the latest snapshot is corrupted (rare but not impossible), there is nothing to fall back on.
Snapshot plus incremental log. Store periodic snapshots and every individual Yjs update in an append-only log:
CREATE TABLE documents (
name TEXT PRIMARY KEY,
yjs_state BYTEA NOT NULL,
updated_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE document_updates (
id BIGSERIAL PRIMARY KEY,
doc_name TEXT NOT NULL REFERENCES documents(name),
yjs_update BYTEA NOT NULL,
created_at TIMESTAMPTZ DEFAULT now()
);
This enables point-in-time recovery (replay updates from any snapshot forward), audit trails (who changed what and when, with a user_id column), and undo beyond the client session. The tradeoff is more storage and more complex recovery logic. For most applications, the snapshot-only approach is sufficient. Add the incremental log when compliance, history, or undo past browser sessions is needed.
Lifecycle Hooks Reference
The hook execution order matters for debugging:
- onConnect fires on TCP handshake. Reject bad connections early here (wrong origin, missing headers, IP blocklist).
- onAuthenticate fires after connect. Verify the JWT, check permissions, attach user metadata. This is where most teams do their authorization logic.
- onLoadDocument fires when a document is accessed for the first time on this server. Load from database. Seed default content for new documents.
- onChange fires on every Yjs update. Use it for logging, analytics, or triggering webhooks. Keep it lightweight because it runs on every keystroke.
- onStoreDocument fires after the debounce period. Persist to database.
- onDisconnect fires when a client drops. Clean up user-specific state, update presence.
Not all hooks need implementation. A minimal production setup uses onAuthenticate (verify JWT), onLoadDocument (read from PostgreSQL), and onStoreDocument (write to PostgreSQL). Everything else is optional.
Failure Scenarios
Scenario 1: Server crash with unsaved edits. A Hocuspocus server crashes. Its in-memory Y.Doc instances had accumulated 2 seconds of edits since the last persistence flush. Those edits are gone. Clients detect the dropped WebSocket, reconnect to another server (via the load balancer), and that server loads the last persisted snapshot from PostgreSQL. Editing continues, but users lose their last few keystrokes.
How bad is this? For most collaborative editing, losing 2 seconds of keystrokes is annoying but survivable. Users notice a brief flicker as the editor reloads and their last word disappears.
Mitigation: Reduce the debounce interval for critical documents (500ms instead of 2000ms). Or use the incremental log strategy: persist every update to the append-only document_updates table with a much shorter debounce (200ms), while keeping the full snapshot debounce at 2 seconds. This way, recovery replays the incremental log on top of the last snapshot, losing at most 200ms of edits.
Scenario 2: Redis pub/sub goes down. Multi-server fan-out stops. Clients on different servers edit the same document independently. Their changes diverge. When Redis comes back, the pub/sub channels resume, but Hocuspocus does not automatically reconcile the diverged states. Each server has a different version of the document.
Detection: Monitor Redis pub/sub latency and connection status as an SLI. Alert when any Hocuspocus instance loses its Redis connection.
Recovery: When Redis disconnects, force all clients on the affected document to reconnect. This triggers a fresh sync from the persisted state, which is the last known good version. Some edits made during the Redis outage may be lost (the ones that were not persisted). The alternative is to switch the editor to read-only mode during Redis downtime, which is safer but more disruptive.
Scenario 3: Slow persistence causing memory pressure. PostgreSQL is under heavy load. The onStoreDocument calls start taking 5 seconds instead of 50ms. Meanwhile, documents keep accumulating in-memory updates. With 1,000 active documents each growing by 10KB per second of unsaved edits, memory usage climbs by 10MB per second. After a few minutes, the server runs out of memory.
Prevention: Set a timeout on persistence calls. If PostgreSQL does not respond within 3 seconds, log the failure and retry on the next debounce cycle. Monitor the persistence queue depth. If it grows beyond a threshold, start rejecting new connections (backpressure) rather than silently accumulating unbounded in-memory state.
Pros
- • Native Yjs binary sync protocol. No JSON serialization overhead
- • Hook-based lifecycle (onConnect, onAuthenticate, onLoadDocument, onStoreDocument) for custom logic
- • Persistence adapters for PostgreSQL, SQLite, Redis, and S3
- • Horizontal scaling via Redis pub/sub for multi-server coordination
Cons
- • Node.js only. No Go, Rust, or Python server implementation
- • Single-document-per-connection model complicates multi-document UIs
- • No built-in rate limiting. Must be added separately
- • Debugging sync issues requires understanding Yjs internals and the binary protocol
When to use
- • Building collaborative features with Yjs and need a server for relay and persistence
- • Need authentication before granting document access
- • Running multiple server instances that need to stay in sync
- • Want debounced persistence without building custom flush logic
When NOT to use
- • Non-Yjs collaboration (use Socket.IO or a custom WebSocket server)
- • Simple real-time features like presence indicators (use Supabase Realtime or Pusher)
- • Read-only document viewing with no editing
- • Teams that need a server in Go, Rust, or Java
Key Points
- •Hocuspocus is to Yjs what Express is to HTTP. Yjs handles the CRDT math. Hocuspocus handles the server lifecycle: authentication, persistence, multi-server sync, and connection management
- •The sync handshake works in two phases. Client sends its state vector (what it has seen). Server computes the diff (what the client is missing). Only missing updates are sent, as a binary blob roughly 10x smaller than the JSON equivalent
- •Documents live in memory while active. When the last client disconnects, Hocuspocus flushes to the persistence layer and unloads the document. This keeps memory usage proportional to active documents, not total documents
- •Persistence is debounced by default (2 seconds of inactivity). During active editing with 50 users, this means 1-2 database writes per second per document instead of hundreds
- •Multi-server coordination uses Redis pub/sub. Every Yjs update is published to a channel named after the document. All servers subscribe. The latency overhead is 1-2ms per update
- •The hook lifecycle provides control at every stage: onConnect (TCP handshake), onAuthenticate (verify JWT), onLoadDocument (fetch from DB), onChange (every edit), onStoreDocument (persist), onDisconnect (cleanup)
Common Mistakes
- ✗Persisting on every Yjs update instead of using the debounce. With 100 active editors, that is hundreds of writes per second to the database. Use the default debounce or tune it for the workload
- ✗Running multiple servers without Redis pub/sub. Users on different servers editing the same document will diverge silently. Their edits only merge when someone reconnects
- ✗Skipping the onAuthenticate hook. Without it, anyone with the document URL can connect and edit. Hocuspocus does not enforce auth by default
- ✗Ignoring reconnection handling. Clients disconnect constantly (network switches, laptop sleep, mobile backgrounding). The sync protocol handles this gracefully, but the UI needs to show connection status
- ✗Storing documents as JSON instead of binary Yjs snapshots. The binary format is what Yjs produces natively. Converting to JSON for storage and back to binary on load wastes CPU and can introduce subtle encoding bugs