Connections, feed, messaging, and search. The connections graph is the hard part — 1st/2nd/3rd-degree expansion is where most designs fall apart at scale.
Key Abstractions
A member's identity: headline, experience, skills
Undirected graph of accepted connections. Backs 1st/2nd/3rd-degree queries.
State machine: pending → accepted / ignored / withdrawn
Per-member ranked timeline of posts from connections
Conversation between two or more members with delivered/read receipts
Strategy that scores posts — chronological, engagement-based, or affinity
Class Diagram
The Key Insight
The connections graph drives everything LinkedIn does. Feed (posts from connections), messaging (can only message 1st-degree by default), People You May Know (2nd-degree with mutual counts), even profile visibility (full profile to 1st, partial to 2nd). The graph is the spine.
Keeping it undirected is the single best simplification. A LinkedIn connection is mutual by definition — Alice-Bob and Bob-Alice are the same edge. Directed edges (Twitter's follow model) need twice the storage and force duplicate logic for every "are we connected" query. Undirected with a single connect(a, b) that populates both adjacency sets is clean and fast.
Degree computation bounded to 3 is a performance contract, not a feature limit. The graph of 1 billion members with an average of 500 connections means 2nd-degree = 250,000 and 3rd-degree = 125 million. Past degree-3 the set is nearly the whole platform — meaningless to compute and pointlessly expensive. The API admits this by capping.
Feed ranking as a Strategy is the second architectural hinge. LinkedIn A/B tests feed ranking constantly; coupling the ranker to the feed class means every experiment is a full deploy. Strategy plus dependency injection means a new ranker is a new class.
Requirements
Functional
- Register members with profiles
- Send, accept, ignore, or withdraw connection requests
- Compute degree-of-separation bounded to 3
- Publish posts; display a personalized feed of connections' posts
- Send direct messages with delivery and read receipts
- "People You May Know" suggestions
Non-Functional
- O(1) connection check between two members
- Degree-2/3 computation fast enough for profile page load
- Feed fetch in sub-100ms for typical member (few hundred connections)
- Messages append-only; history is immutable
Design Decisions
Why undirected connections?
LinkedIn's product semantics: a connection is mutual consent. Encoding directionality creates ambiguity ("A follows B but not vice versa" — which one is the connection?). Undirected graph matches the domain.
Why BFS for degree?
Degree-of-separation is a shortest-path query with unweighted edges. BFS is optimal. Bounding to max_degree = 3 caps the explosion. More specialized approaches (landmark embeddings, compressed graph) live at the infrastructure layer, not the model layer.
Why separate Feed from FeedRanker?
The feed is "gather posts from connections." The ranker is "decide which ones to show first." Separating them means the gathering logic (which is about connections) stays stable while ranking experiments churn. Same Feed.for_member() call, different scoring.
Why append-only messages?
Editing a message in place changes history. A recipient who already saw "meet at noon" shouldn't find it saying "meet at midnight" without a visible edit marker. Append-only plus an edits list gives tamper-evident conversation history.
Why PYMK as 2nd-degree with mutual count?
A 2nd-degree connection with many mutuals is a very likely real-world acquaintance. Sorting by mutual count is a cheap, effective baseline. Production adds school/company overlap, co-visited profiles, and embeddings — but the graph-based core holds.
Interview Follow-ups
- "How would you scale the feed to 1B members?" Hybrid pull (most members, compute on read) + push (highly-active accounts, precompute into delivery queues). Celebrities with millions of connections break pure push — store their posts once and pull on read.
- "How does feed deduplication work?" Bloom filter per viewer tracks seen post IDs. Feed generation skips any post already seen in the last N days.
- "What about privacy settings?"
VisibilityScopeenum per field (public, connections, me). Profile rendering checks viewer's degree to profile owner against the scope for each field. - "How would you implement real-time connection notifications?" Server-side event pushed through a WebSocket or long-poll channel when a request is sent/accepted. Falls back to email if offline.
- "How do recruiter searches work?" Inverted index (Elasticsearch) on profile fields. Filters like "2nd-degree" join against the connection graph post-query — or precomputed degree-2 sets per recruiter.
Code Implementation
1 from __future__ import annotations
2 from abc import ABC, abstractmethod
3 from collections import defaultdict, deque
4 from dataclasses import dataclass, field
5 from datetime import datetime, timedelta
6 from enum import Enum
7 from heapq import nlargest
8 from threading import RLock
9 import uuid
10
11
12 class RequestStatus(Enum):
13 PENDING = "pending"
14 ACCEPTED = "accepted"
15 IGNORED = "ignored"
16 WITHDRAWN = "withdrawn"
17
18
19 @dataclass
20 class Profile:
21 name: str
22 headline: str = ""
23 skills: list[str] = field(default_factory=list)
24
25
26 @dataclass
27 class Member:
28 id: str
29 profile: Profile
30
31
32 class ConnectionGraph:
33 """Undirected edges. Adjacency set keyed by member ID."""
34
35 def __init__(self):
36 self._edges: dict[str, set[str]] = defaultdict(set)
37 self._lock = RLock()
38
39 def connect(self, a: str, b: str) -> None:
40 if a == b:
41 raise ValueError("cannot connect a member to themselves")
42 with self._lock:
43 self._edges[a].add(b)
44 self._edges[b].add(a)
45
46 def disconnect(self, a: str, b: str) -> None:
47 with self._lock:
48 self._edges[a].discard(b)
49 self._edges[b].discard(a)
50
51 def are_connected(self, a: str, b: str) -> bool:
52 with self._lock:
53 return b in self._edges[a]
54
55 def neighbors(self, a: str) -> set[str]:
56 with self._lock:
57 return set(self._edges[a])
58
59 def degree_between(self, a: str, b: str, max_degree: int = 3) -> int | None:
60 """BFS bounded to max_degree. None if not reachable within that distance."""
61 if a == b:
62 return 0
63 with self._lock:
64 visited = {a}
65 frontier = deque([(a, 0)])
66 while frontier:
67 current, depth = frontier.popleft()
68 if depth >= max_degree:
69 continue
70 for neighbor in self._edges[current]:
71 if neighbor == b:
72 return depth + 1
73 if neighbor not in visited:
74 visited.add(neighbor)
75 frontier.append((neighbor, depth + 1))
76 return None
77
78 def suggest_connections(self, a: str, limit: int = 10) -> list[str]:
79 """People You May Know: 2nd-degree connections ranked by mutual count."""
80 with self._lock:
81 first = self._edges[a]
82 mutual_count: dict[str, int] = defaultdict(int)
83 for friend in first:
84 for friend_of_friend in self._edges[friend]:
85 if friend_of_friend == a or friend_of_friend in first:
86 continue
87 mutual_count[friend_of_friend] += 1
88 return [m for m, _ in nlargest(limit, mutual_count.items(), key=lambda kv: kv[1])]
89
90
91 @dataclass
92 class ConnectionRequest:
93 id: str
94 from_id: str
95 to_id: str
96 note: str
97 status: RequestStatus
98 sent_at: datetime
99
100
101 @dataclass
102 class Post:
103 id: str
104 author_id: str
105 content: str
106 created_at: datetime
107 likes: int = 0
108
109
110 class FeedRanker(ABC):
111 @abstractmethod
112 def score(self, post: Post, viewer_id: str) -> float: ...
113
114
115 class ChronologicalRanker(FeedRanker):
116 def score(self, post: Post, viewer_id: str) -> float:
117 # More recent posts score higher. Unix seconds are fine as a scalar.
118 return post.created_at.timestamp()
119
120
121 class EngagementRanker(FeedRanker):
122 """Balance of recency and likes. Decays over time."""
123
124 def __init__(self, half_life: timedelta = timedelta(hours=12)):
125 self._half_life = half_life.total_seconds()
126
127 def score(self, post: Post, viewer_id: str) -> float:
128 age = (datetime.utcnow() - post.created_at).total_seconds()
129 recency = 2 ** (-age / self._half_life)
130 return recency * (1 + post.likes)
131
132
133 class Feed:
134 def __init__(self, graph: ConnectionGraph, ranker: FeedRanker | None = None):
135 self._graph = graph
136 self._posts_by_author: dict[str, list[Post]] = defaultdict(list)
137 self._ranker = ranker or EngagementRanker()
138 self._lock = RLock()
139
140 def record_post(self, post: Post) -> None:
141 with self._lock:
142 self._posts_by_author[post.author_id].append(post)
143
144 def for_member(self, viewer_id: str, limit: int = 50) -> list[Post]:
145 with self._lock:
146 connections = self._graph.neighbors(viewer_id)
147 candidates: list[Post] = []
148 for author_id in connections:
149 candidates.extend(self._posts_by_author[author_id])
150 scored = sorted(candidates, key=lambda p: self._ranker.score(p, viewer_id), reverse=True)
151 return scored[:limit]
152
153
154 @dataclass
155 class Message:
156 id: str
157 thread_id: str
158 author_id: str
159 body: str
160 sent_at: datetime
161 delivered_to: set[str] = field(default_factory=set)
162 read_by: set[str] = field(default_factory=set)
163
164
165 class MessageThread:
166 def __init__(self, participants: set[str]):
167 if len(participants) < 2:
168 raise ValueError("thread needs at least 2 participants")
169 self.id = str(uuid.uuid4())[:8]
170 self.participants = set(participants)
171 self.messages: list[Message] = []
172 self._lock = RLock()
173
174 def append(self, author_id: str, body: str) -> Message:
175 if author_id not in self.participants:
176 raise ValueError("author not a participant")
177 msg = Message(
178 id=str(uuid.uuid4())[:8],
179 thread_id=self.id,
180 author_id=author_id,
181 body=body,
182 sent_at=datetime.utcnow(),
183 )
184 with self._lock:
185 self.messages.append(msg)
186 return msg
187
188 def mark_delivered(self, member_id: str) -> None:
189 if member_id not in self.participants:
190 return
191 with self._lock:
192 for m in self.messages:
193 m.delivered_to.add(member_id)
194
195 def mark_read(self, member_id: str) -> None:
196 if member_id not in self.participants:
197 return
198 with self._lock:
199 for m in self.messages:
200 m.read_by.add(member_id)
201
202
203 class LinkedInService:
204 def __init__(self):
205 self._members: dict[str, Member] = {}
206 self._graph = ConnectionGraph()
207 self._requests: dict[str, ConnectionRequest] = {}
208 self._threads: dict[str, MessageThread] = {}
209 self._feed = Feed(self._graph)
210
211 def register(self, name: str, headline: str = "") -> Member:
212 m = Member(id=str(uuid.uuid4())[:8], profile=Profile(name=name, headline=headline))
213 self._members[m.id] = m
214 return m
215
216 def send_connection_request(self, from_id: str, to_id: str, note: str = "") -> str:
217 if from_id == to_id:
218 raise ValueError("self-connect")
219 if self._graph.are_connected(from_id, to_id):
220 raise RuntimeError("already connected")
221 # Avoid duplicate pending requests.
222 for req in self._requests.values():
223 if req.from_id == from_id and req.to_id == to_id and req.status == RequestStatus.PENDING:
224 return req.id
225 req = ConnectionRequest(
226 id=str(uuid.uuid4())[:8], from_id=from_id, to_id=to_id,
227 note=note, status=RequestStatus.PENDING, sent_at=datetime.utcnow(),
228 )
229 self._requests[req.id] = req
230 return req.id
231
232 def accept_request(self, request_id: str, actor: str) -> None:
233 req = self._requests[request_id]
234 if req.to_id != actor:
235 raise PermissionError("only the recipient can accept")
236 if req.status != RequestStatus.PENDING:
237 raise RuntimeError(f"cannot accept a {req.status.value} request")
238 req.status = RequestStatus.ACCEPTED
239 self._graph.connect(req.from_id, req.to_id)
240
241 def ignore_request(self, request_id: str, actor: str) -> None:
242 """Recipient dismisses the invite without connecting. Sender isn't notified."""
243 req = self._requests[request_id]
244 if req.to_id != actor:
245 raise PermissionError("only the recipient can ignore")
246 if req.status != RequestStatus.PENDING:
247 raise RuntimeError(f"cannot ignore a {req.status.value} request")
248 req.status = RequestStatus.IGNORED
249
250 def withdraw_request(self, request_id: str, actor: str) -> None:
251 """Sender takes the invite back before the recipient responds."""
252 req = self._requests[request_id]
253 if req.from_id != actor:
254 raise PermissionError("only the sender can withdraw")
255 if req.status != RequestStatus.PENDING:
256 raise RuntimeError(f"cannot withdraw a {req.status.value} request")
257 req.status = RequestStatus.WITHDRAWN
258
259 def publish_post(self, author_id: str, content: str) -> Post:
260 post = Post(id=str(uuid.uuid4())[:8], author_id=author_id,
261 content=content, created_at=datetime.utcnow())
262 self._feed.record_post(post)
263 return post
264
265 def get_feed(self, member_id: str, limit: int = 50) -> list[Post]:
266 return self._feed.for_member(member_id, limit)
267
268 def start_thread(self, participants: list[str]) -> str:
269 # Members can only message 1st-degree connections, unless paid tier. Enforced elsewhere.
270 thread = MessageThread(set(participants))
271 self._threads[thread.id] = thread
272 return thread.id
273
274 def send_message(self, thread_id: str, author_id: str, body: str) -> Message:
275 thread = self._threads[thread_id]
276 return thread.append(author_id, body)
277
278 def graph(self) -> ConnectionGraph:
279 return self._graph
280
281
282 if __name__ == "__main__":
283 svc = LinkedInService()
284 alice = svc.register("Alice", "Staff Eng")
285 bob = svc.register("Bob", "Engineer")
286 carol = svc.register("Carol", "Recruiter")
287 dave = svc.register("Dave", "PM")
288
289 # Alice <-> Bob, Bob <-> Carol, Carol <-> Dave.
290 for sender, receiver in [(alice, bob), (bob, carol), (carol, dave)]:
291 rid = svc.send_connection_request(sender.id, receiver.id)
292 svc.accept_request(rid, actor=receiver.id)
293
294 # Distances.
295 g = svc.graph()
296 print(f"Alice -> Bob: degree {g.degree_between(alice.id, bob.id)}") # 1
297 print(f"Alice -> Carol: degree {g.degree_between(alice.id, carol.id)}") # 2
298 print(f"Alice -> Dave: degree {g.degree_between(alice.id, dave.id)}") # 3
299
300 # Ignore: Dave dismisses an unsolicited invite from Alice.
301 rid_ignored = svc.send_connection_request(alice.id, dave.id)
302 svc.ignore_request(rid_ignored, actor=dave.id)
303 print(f"Ignored request status: {svc._requests[rid_ignored].status.value}")
304 assert not g.are_connected(alice.id, dave.id)
305
306 # Withdraw: Alice sends a fresh invite then retracts it herself.
307 rid_withdrawn = svc.send_connection_request(alice.id, dave.id)
308 svc.withdraw_request(rid_withdrawn, actor=alice.id)
309 print(f"Withdrawn request status: {svc._requests[rid_withdrawn].status.value}")
310 assert not g.are_connected(alice.id, dave.id)
311
312 # People You May Know — Carol is Alice's 2nd-degree (via Bob).
313 suggestions = g.suggest_connections(alice.id)
314 print(f"Suggestions for Alice: {[svc._members[s].profile.name for s in suggestions]}")
315
316 # Posts and feed.
317 svc.publish_post(bob.id, "Just shipped a new LLD article.")
318 svc.publish_post(bob.id, "Hiring senior engineers.")
319 feed = svc.get_feed(alice.id)
320 print(f"Alice's feed has {len(feed)} post(s).")
321
322 # Messaging.
323 thread_id = svc.start_thread([alice.id, bob.id])
324 svc.send_message(thread_id, alice.id, "Hi Bob!")
325 svc.send_message(thread_id, bob.id, "Hey Alice.")
326 thread = svc._threads[thread_id]
327 print(f"Thread has {len(thread.messages)} messages.")Common Mistakes
- ✗Storing connections directed (follower/followed). Breaks the 'we are connected' symmetry every feature depends on.
- ✗Computing degree-3 on every profile view. Millions of edges; blow up server CPU.
- ✗Putting feed ranking inside the feed class. New ranking experiment = rewriting the feed.
- ✗Treating messages as mutable — editing alters the conversation history. Use an edit flag plus history.
Key Points
- ✓Connection is undirected — accepting an invite adds an edge both ways.
- ✓Degree computation is BFS bounded to 3 levels. Past degree-3 the graph explodes.
- ✓Feed is pull + cache. Build on read for inactive users, precompute for power users.
- ✓Messaging is append-only per thread; receipts (delivered, read) update separately from messages.