Operational Transformation

What OT Actually Does

You and a coworker are editing the same document. You type "Hello" at the start. At the same moment, your coworker deletes a word in the middle of the document. Both edits happen on your local copies before either one reaches the server.

The problem: your coworker's delete was calculated against the old document, before your "Hello" was inserted. The character positions in their delete operation are now wrong because your insert shifted everything over by five characters.

OT fixes this by transforming one operation against the other. When the server receives both operations, it adjusts the positions in the second operation to account for the first. Your coworker's delete gets its position shifted by +5 to account for your insert. Both clients end up with the same document.

This sounds simple, and for two operations it is. The complexity explodes when you have dozens of users making edits simultaneously, each edit needing transformation against every other concurrent edit.

The Transform Function

The core of OT is a function transform(op1, op2) that takes two concurrent operations and returns a modified pair (op1', op2') such that applying op1 then op2' produces the same result as applying op2 then op1'.

For text editing, you only need two primitive operations: insert(char, position) and delete(position). The transform rules for these are straightforward:

insert vs insert: If both insert at the same position, pick a tiebreaker (usually lower client ID goes first). The other insert gets its position incremented by 1.

insert vs delete: If the insert position is before or at the delete position, increment the delete position by 1. If the insert position is after the delete position, decrement the insert position by 1.

delete vs delete: If both delete the same position, one becomes a no-op (the character is already gone). If they delete different positions, the one with the higher position gets decremented if it comes after the other.

Four rules. That is the entire transform function for plain text. You could implement this in an afternoon.

The trap is that these four rules need to be provably correct for any sequence of concurrent operations, not just pairs. And that is where things get ugly.

Why OT Needs a Central Server

In a peer-to-peer system, operations can arrive in any order at any node. Client A sends an insert to clients B and C. Client B transforms it and sends the result to client C. Client C also received the original from A and transforms it independently. Now client C has two versions of the transformed operation and needs to reconcile them.

This is the TP2 (Transformation Property 2) problem. TP1 says: transforming and applying two concurrent ops in either order gives the same result. TP2 extends this to three or more concurrent ops, requiring that transformation is consistent regardless of which intermediate path you take through the transformation graph.

TP2 is brutally hard to satisfy. Multiple published OT algorithms that were "proven correct" were later found to violate TP2 in edge cases. The dOPT algorithm, one of the earliest, had a known bug. The adOPTed algorithm fixed it but introduced another. The GOT and GOTO algorithms attempted general solutions but added significant complexity.

Google sidestepped the entire problem. The Jupiter protocol avoids TP2 by using a central server as the single ordering authority. Every operation goes through the server. The server applies operations in a single canonical order. Clients never need to reconcile operations that took different transformation paths because there is only one path: through the server.

This is the key architectural constraint of OT. You need a server (or some equivalent total ordering mechanism). Take the server away and you are fighting a correctness problem that has defeated some very smart researchers.

The Jupiter Protocol

Google's approach, used in Google Docs, is elegant in its simplicity.

Each client maintains a local copy of the document and a revision number. When you type something, the client applies it immediately to your local copy (so you see zero latency) and sends the operation to the server tagged with the revision number you were at when you made the edit.

The server maintains the canonical document and a log of all operations in order. When it receives your operation, it transforms it against every operation that happened between your revision and the current revision. Then it applies the transformed operation and broadcasts it to all other clients.

Other clients receive the server's broadcast and transform it against any local operations they have made but the server has not yet acknowledged. This is the "client prediction" part. You keep typing while waiting for the server, and reconcile when the server responds.

The result: every user sees their own edits instantly. Server round-trip latency only affects seeing other people's edits. The server guarantees a single canonical history, so convergence is straightforward.

The downside is obvious. If the server is down, nobody can collaborate. If the server is slow, other people's edits lag. If you want offline editing, you need a separate mechanism to queue operations and replay them when connectivity returns. Google Docs handles short disconnections with operation buffering, but extended offline editing is not its strong suit.

Correctness: Why It Matters

The history of OT is littered with algorithms that looked correct but were not.

Ellis and Gibbs published dOPT in 1989. It worked for two concurrent operations but failed for three. Ressel et al. published adOPTed in 1996 to fix it, but their TP2 proof had a gap. Sun et al. proposed GOT/GOTO in 1998 with a general transformation framework, but the implementation complexity was substantial.

The root problem: proving that a set of transform functions satisfies TP2 for all possible operation sequences is combinatorially explosive. You are not just checking pairs of operations. You are checking that every possible ordering of N concurrent operations, transformed through every possible path, arrives at the same result. The number of paths grows factorially.

Jupiter dodges this entirely. With a central server defining the canonical order, there is only one transformation path per operation. TP1 is sufficient. You never need TP2 because the server serializes everything.

This is why virtually every production OT system uses a central server. The theoretical elegance of peer-to-peer OT cannot overcome the practical impossibility of guaranteeing TP2 for complex operation types.

Beyond Plain Text

Google Sheets uses OT for spreadsheet operations: insert row, delete column, set cell value, merge cells. Each new operation type creates new transform function pairs. With 10 operation types, you need transform functions for all 55 pairs (10 choose 2, plus 10 same-type pairs). Every pair needs to be correct. Every pair needs testing.

Rich text editing adds formatting operations: bold a range, set font size on a range, create a link. These interact with insert and delete in non-obvious ways. If I bold characters 5-10 and you simultaneously insert a character at position 7, does the new character get bold? (Yes, usually.) What if you insert at position 5 exactly? (Depends on the implementation.)

Tree-structured documents (like Google Docs' internal representation) add move operations, and those bring new headaches. Moving a subtree while someone else edits a node within it requires careful transform logic to maintain the tree's structural integrity.

Each extension is doable but expensive. The testing surface grows quadratically with operation types, and edge cases hide in the combinations that nobody thought to test.

OT vs CRDTs

OT and CRDTs solve the same problem (concurrent editing with convergence) but with fundamentally different architectures.

OT transforms operations against a canonical order. It needs a server. It sends small operation messages. It has 20 years of production track record in Google Docs. The math is hard to prove correct, but the Jupiter protocol makes it manageable by centralizing the ordering.

CRDTs guarantee convergence through algebraic properties of the data structure itself. No server required. Replicas can sync in any order, at any time, through any topology. The trade-off is metadata overhead (tombstones, version vectors, tag sets) and the constraint that your data must fit into a CRDT-compatible type.

When to pick OT: you have a server, you need real-time collaborative editing, your data is text or structured documents, and you want minimal bandwidth per edit. Google Docs is the proof that this works at massive scale.

When to pick CRDTs: you need peer-to-peer sync, offline-first editing, multi-datacenter replication, or you are building a system where no single server should be a point of failure. Figma, Automerge, and Yjs prove this works too.

The lines are blurring. Yjs is technically a CRDT library but borrows ideas from OT. Automerge started as pure CRDT but now has optimizations inspired by OT's operation-shipping model. The next generation of collaborative editing frameworks will likely be hybrids, taking the best of both approaches.

The Practical Landscape Today

If you are building a collaborative editor in 2025, your realistic options are:

Use a library. Yjs (CRDT-based), Automerge (CRDT-based), or ShareDB (OT-based) give you collaborative editing without implementing transform functions yourself. This is the right choice for almost everyone.

Use a hosted service. Liveblocks, Tiptap Cloud, or Firebase Realtime Database handle the sync layer so you do not have to.

Roll your own. Only if you have a very specific data model that existing libraries do not support. Expect to spend months on correctness testing. Google has an entire team maintaining their OT implementation.

The days of implementing OT from scratch for a side project are over. The algorithm is well-understood, the libraries are mature, and the hard problems (correctness proofs, garbage collection, offline reconciliation) have been solved by people who spent years on them. Pick a library, understand the trade-offs, and build your product.

What OT Actually Does

The Transform Function

For text editing, you only need two primitive operations: insert(char, position) and delete(position). The transform rules for these are straightforward:

insert vs insert: If both insert at the same position, pick a tiebreaker (usually lower client ID goes first). The other insert gets its position incremented by 1.

Four rules. That is the entire transform function for plain text. You could implement this in an afternoon.

The trap is that these four rules need to be provably correct for any sequence of concurrent operations, not just pairs. And that is where things get ugly.

Why OT Needs a Central Server

The Jupiter Protocol

Google's approach, used in Google Docs, is elegant in its simplicity.

Correctness: Why It Matters

The history of OT is littered with algorithms that looked correct but were not.

Beyond Plain Text

Each extension is doable but expensive. The testing surface grows quadratically with operation types, and edge cases hide in the combinations that nobody thought to test.

OT vs CRDTs

OT and CRDTs solve the same problem (concurrent editing with convergence) but with fundamentally different architectures.

The Practical Landscape Today

If you are building a collaborative editor in 2025, your realistic options are:

Use a hosted service. Liveblocks, Tiptap Cloud, or Firebase Realtime Database handle the sync layer so you do not have to.

Architecture

What OT Actually Does

The Transform Function

Why OT Needs a Central Server

The Jupiter Protocol

Correctness: Why It Matters

Beyond Plain Text

OT vs CRDTs

The Practical Landscape Today

Key Points

Used By

Common Mistakes

Related

Operational Transformation

Architecture

What OT Actually Does

The Transform Function

Why OT Needs a Central Server

The Jupiter Protocol

Correctness: Why It Matters

Beyond Plain Text

OT vs CRDTs

The Practical Landscape Today

Key Points

Used By

Common Mistakes

Related