Collaborative Editor
Google Docs at its core. Operational Transformation vs CRDTs — the real question is how two people typing at the same position converge to the same document.
Key Abstractions
A single edit: insert(pos, text), delete(pos, len). Carries an author ID and vector clock.
The converged text state. Applies operations and exposes content.
Transforms concurrent operations so they commute. Insert at 5 + insert at 3 → adjust the 5 to 6.
One client session. Owns its local doc, vector clock, and pending-op buffer.
Ordering authority. Receives ops from all sites, transforms, broadcasts.
Class Diagram
The Key Insight
Two users typing at the same position is the cruel version of this problem. Alice inserts "cat" at index 5 on her local copy. Bob simultaneously inserts "dog" at index 5 on his. No central clock arbitrates. If the server just broadcasts both ops as-is, Alice applies Bob's (now stale) insert and her document diverges from his.
Operational Transformation rewrites an op before applying it so the result is the same regardless of which order the two users saw them in. If Alice's op arrives at Bob's site after Bob already applied his own, Alice's op gets transformed against Bob's — position shifts, length adjusts — and the final string is identical on both sites.
Vector clocks are the causality primitive. Wall clocks lie when networks are involved; Lamport timestamps don't capture concurrency precisely. A vector clock per author tells the server exactly which ops a given op has already seen, so it knows which ops to transform against.
This is a sketch of the real thing. Production OT has decades of subtlety (TP1/TP2 properties, inclusion transformation vs exclusion). CRDTs (like Yjs, Automerge) bypass OT entirely by making operations commute by construction — worth mentioning in the interview as an alternative.
Requirements
Functional
- Multiple sites edit a shared document concurrently
- Every site converges to the same final document
- Local edits apply instantly (no server round-trip delay for typing)
- Deleted and inserted ranges coexist without corrupting the document
Non-Functional
- Transformation preserves user intent as much as possible
- Deterministic convergence (same final state on every site)
- No lost updates under normal network conditions
- Graceful handling of reordered message delivery
Design Decisions
Why OT over CRDTs?
OT is easier to explain in an interview. CRDTs (lists of tombstoned characters with dense identifiers) require deeper setup — Logoot, YATA, RGA. Neither is strictly better; Google Docs ships OT, Figma ships CRDTs. Modern advice leans CRDT because it's compositional and easier to reason about under partition.
Why vector clocks instead of Lamport timestamps?
Lamport captures ordering but not concurrency. Two ops with Lamport timestamps 5 and 7 might be causally concurrent or causally ordered — the single number can't tell. Vector clocks capture both.
Why apply local ops immediately?
A collab editor must feel like a normal editor. Every millisecond of input latency shows up in user satisfaction. Local-first application makes typing feel native. The reconciliation layer handles convergence after the fact.
Why tie-break by author ID on same-position inserts?
Without a tie-breaker, "insert 'cat' at 5" and "insert 'dog' at 5" can converge to "catdog" on one site and "dogcat" on the other. Sorting by author ID is arbitrary but deterministic — every site picks the same winner.
Interview Follow-ups
- "How would this scale to 100 editors?" OT's transformation cost is O(pending_ops) per incoming op. Practical collaboration rarely exceeds 20 active editors; past that, CRDTs win on the pathological cases.
- "What about offline editing?" Accumulate ops locally with a vector clock. On reconnect, flush the queue; server transforms each against the ops it accepted meanwhile. No data loss if vector clocks are maintained correctly.
- "How does rich-text formatting fit?" Treat formatting as a third op type —
SetAttribute(range, key, value). Transforms follow the same insert/delete adjustments. - "Why not just last-writer-wins?" For scalar fields yes, but for text it's a disaster — one user's keystroke obliterates another user's paragraph. Merge semantics must preserve both users' intents where possible.
- "How would you handle undo?" Per-user undo stacks applied as inverse ops. Undoing your own op at position 5 becomes an insert/delete broadcast like any other. Other users' concurrent edits have already shifted your op's effective position — transform the undo accordingly.
Code Implementation
1 from __future__ import annotations
2 from abc import ABC, abstractmethod
3 from dataclasses import dataclass, field
4 from typing import Callable
5
6
7 class VectorClock:
8 """Per-site logical clock. Captures causality between operations."""
9
10 def __init__(self, ticks: dict[str, int] | None = None):
11 self._ticks: dict[str, int] = dict(ticks or {})
12
13 def tick(self, site: str) -> "VectorClock":
14 new = VectorClock(self._ticks)
15 new._ticks[site] = new._ticks.get(site, 0) + 1
16 return new
17
18 def merge(self, other: "VectorClock") -> "VectorClock":
19 merged: dict[str, int] = {}
20 for s in self._ticks.keys() | other._ticks.keys():
21 merged[s] = max(self._ticks.get(s, 0), other._ticks.get(s, 0))
22 return VectorClock(merged)
23
24 def happens_before(self, other: "VectorClock") -> bool:
25 # self < other if every entry <= and at least one <.
26 strictly_less = False
27 for s in self._ticks.keys() | other._ticks.keys():
28 a = self._ticks.get(s, 0)
29 b = other._ticks.get(s, 0)
30 if a > b:
31 return False
32 if a < b:
33 strictly_less = True
34 return strictly_less
35
36 def is_concurrent_with(self, other: "VectorClock") -> bool:
37 return not self.happens_before(other) and not other.happens_before(self) and self._ticks != other._ticks
38
39 def get(self, site: str) -> int:
40 return self._ticks.get(site, 0)
41
42 def __repr__(self) -> str:
43 return f"VC({self._ticks})"
44
45
46 class Operation(ABC):
47 def __init__(self, author: str, clock: VectorClock):
48 self.author = author
49 self.clock = clock
50
51 @abstractmethod
52 def apply(self, doc: "Document") -> None: ...
53
54 @abstractmethod
55 def transform_against(self, other: "Operation") -> "Operation":
56 """Return self adjusted so it can be applied after `other` was applied."""
57 ...
58
59
60 class InsertOp(Operation):
61 def __init__(self, position: int, text: str, author: str, clock: VectorClock):
62 super().__init__(author, clock)
63 self.position = position
64 self.text = text
65
66 def apply(self, doc: "Document") -> None:
67 doc.insert(self.position, self.text)
68
69 def transform_against(self, other: Operation) -> "InsertOp":
70 if isinstance(other, InsertOp):
71 # If other inserted strictly before self, self shifts right by other's length.
72 if other.position < self.position:
73 return InsertOp(self.position + len(other.text), self.text, self.author, self.clock)
74 # Same position → tie-break by author ID so both sites converge.
75 if other.position == self.position and other.author < self.author:
76 return InsertOp(self.position + len(other.text), self.text, self.author, self.clock)
77 return self
78 if isinstance(other, DeleteOp):
79 if other.position + other.length <= self.position:
80 return InsertOp(self.position - other.length, self.text, self.author, self.clock)
81 if other.position >= self.position:
82 return self
83 # Deletion straddles our position — clip to the deletion start.
84 return InsertOp(other.position, self.text, self.author, self.clock)
85 return self
86
87 def __repr__(self) -> str:
88 return f"Ins({self.position!r}, {self.text!r}, {self.author})"
89
90
91 class DeleteOp(Operation):
92 def __init__(self, position: int, length: int, author: str, clock: VectorClock):
93 super().__init__(author, clock)
94 self.position = position
95 self.length = length
96
97 def apply(self, doc: "Document") -> None:
98 doc.delete(self.position, self.length)
99
100 def transform_against(self, other: Operation) -> "DeleteOp":
101 if isinstance(other, InsertOp):
102 if other.position <= self.position:
103 return DeleteOp(self.position + len(other.text), self.length, self.author, self.clock)
104 return self
105 if isinstance(other, DeleteOp):
106 other_end = other.position + other.length
107 self_end = self.position + self.length
108 if other_end <= self.position:
109 return DeleteOp(self.position - other.length, self.length, self.author, self.clock)
110 if other.position >= self_end:
111 return self
112 # Overlapping deletes — clip our range to the non-intersecting part.
113 new_start = min(self.position, other.position)
114 overlap = min(self_end, other_end) - max(self.position, other.position)
115 new_length = max(0, self.length - overlap)
116 return DeleteOp(new_start, new_length, self.author, self.clock)
117 return self
118
119 def __repr__(self) -> str:
120 return f"Del({self.position}, len={self.length}, {self.author})"
121
122
123 class Document:
124 def __init__(self, initial: str = ""):
125 self._buf = list(initial)
126
127 def insert(self, pos: int, text: str) -> None:
128 if not 0 <= pos <= len(self._buf):
129 raise IndexError("insert out of bounds")
130 self._buf[pos:pos] = list(text)
131
132 def delete(self, pos: int, length: int) -> None:
133 if pos < 0 or pos + length > len(self._buf):
134 # Clip rather than throw — ops arriving in reordered streams can be slightly stale.
135 length = max(0, min(length, len(self._buf) - pos))
136 del self._buf[pos:pos + length]
137
138 def content(self) -> str:
139 return "".join(self._buf)
140
141
142 class OTEngine:
143 @staticmethod
144 def transform(local: Operation, against: list[Operation]) -> Operation:
145 """Transform `local` past every op in `against` in order."""
146 transformed = local
147 for prior in against:
148 transformed = transformed.transform_against(prior)
149 return transformed
150
151
152 class Server:
153 """Ordering authority. Applies ops sequentially and broadcasts."""
154
155 def __init__(self, initial: str = ""):
156 self._doc = Document(initial)
157 self._log: list[Operation] = []
158 self._subscribers: list[Callable[[Operation], None]] = []
159
160 @property
161 def content(self) -> str: return self._doc.content()
162
163 def subscribe(self, handler: Callable[[Operation], None]) -> None:
164 self._subscribers.append(handler)
165
166 def receive(self, op: Operation) -> Operation:
167 """Accept an op from some site. Returns the transformed op actually applied."""
168 # Transform incoming op against any ops the sender hasn't seen.
169 unseen = [prior for prior in self._log if not prior.clock.happens_before(op.clock) and prior.clock != op.clock]
170 transformed = OTEngine.transform(op, unseen)
171 transformed.apply(self._doc)
172 self._log.append(transformed)
173 # Broadcast the *transformed* op so every peer sees the same thing.
174 for sub in self._subscribers:
175 if sub.__self__.site_id != transformed.author: # type: ignore[attr-defined]
176 sub(transformed)
177 return transformed
178
179
180 class Site:
181 """One client. Applies locally first; reconciles with server on ACK."""
182
183 def __init__(self, site_id: str, server: Server):
184 self.site_id = site_id
185 self._doc = Document(server.content)
186 self._clock = VectorClock()
187 self._server = server
188 self._server.subscribe(self._on_remote)
189
190 def content(self) -> str: return self._doc.content()
191
192 def insert(self, pos: int, text: str) -> None:
193 self._clock = self._clock.tick(self.site_id)
194 op = InsertOp(pos, text, self.site_id, self._clock)
195 op.apply(self._doc)
196 self._server.receive(op)
197
198 def delete(self, pos: int, length: int) -> None:
199 self._clock = self._clock.tick(self.site_id)
200 op = DeleteOp(pos, length, self.site_id, self._clock)
201 op.apply(self._doc)
202 self._server.receive(op)
203
204 def _on_remote(self, op: Operation) -> None:
205 if op.author == self.site_id:
206 return # our own op, already applied locally
207 op.apply(self._doc)
208 self._clock = self._clock.merge(op.clock)
209
210
211 if __name__ == "__main__":
212 server = Server(initial="Hello world")
213 alice = Site("A", server)
214 bob = Site("B", server)
215
216 # Both edit concurrently.
217 alice.insert(6, "beautiful ") # "Hello beautiful world"
218 bob.insert(5, ",") # Bob sees Alice's "Hello world" still (before alice op arrives).
219
220 print(f"Server: {server.content!r}")
221 print(f"Alice: {alice.content()!r}")
222 print(f"Bob: {bob.content()!r}")
223
224 # All three converge eventually — server applies ops in order and broadcasts transformed versions.
225 # Bob's subsequent remote receive picks up Alice's op.
226 assert server.content == alice.content() == bob.content(), \
227 f"Divergence: server={server.content!r}, alice={alice.content()!r}, bob={bob.content()!r}"
228 print(f"Converged: {server.content!r}")Common Mistakes
- ✗Applying raw ops from another user without transformation. Doc states diverge immediately.
- ✗Using server time as the ordering. Vector clocks capture causality correctly; wall time doesn't.
- ✗Blocking local edits until server ACK. Typing feels laggy; collab editors live or die on local-first feel.
- ✗Forgetting to reapply pending ops after a server-side rebase. Local user sees their last edit disappear.
Key Points
- ✓OT's core rule: transform(opA, opB) must satisfy opA·T(opB) ≡ opB·T(opA) — both users converge.
- ✓Every op carries a vector clock so the server knows its causal context.
- ✓Client applies local ops immediately, then reconciles when server confirms.
- ✓Tie-break on concurrent inserts at the same position by author ID — deterministic, unambiguous.