Typeahead Autocomplete
Trie with Flyweight shared prefix nodes, swappable ranking strategies, and Decorator layers for result post-processing. Where data structures meet design patterns.
Key Abstractions
Flyweight: shared prefix nodes with intrinsic char, extrinsic frequency
Manages the node tree, insert/search operations
Interface: FrequencyRanker, RecencyRanker, TrendingRanker
Wraps results: SpellCheckDecorator, PersonalizationDecorator, DedupDecorator
Factory creates the right provider per context
Observer: tracks user selections to feed back into ranking
Class Diagram
How It Works
Typeahead autocomplete is the system behind every search box that finishes your sentences. You type "app" and instantly see "apple", "application", "appetite". The core data structure is a trie: a tree where each edge represents a character and shared prefixes share the same path. "apple" and "application" share the nodes for a-p-p-l, then diverge. That's the Flyweight pattern in its most natural form. The character at each node (intrinsic state) is shared across every word that passes through it. Per-word metadata like frequency (extrinsic state) lives at terminal nodes or in external maps.
But a trie alone only gives you completions. It doesn't tell you which ones matter. The Strategy pattern fills that gap. A search bar ranks by frequency (show the most popular completions). A code editor ranks by recency (show what the developer typed last). A trending feature blends both. The ranking algorithm is a swappable strategy: you can change it without touching a single line of trie code.
Raw results from the ranker often need post-processing. Maybe you want deduplication (case-insensitive). Maybe you want spell-check suggestions mixed in. Maybe personalization reorders based on user history. Each of these is a Decorator that wraps the result list, transforms it, and passes it along. Stack them in any order. Add new ones without modifying existing code.
The Observer pattern closes the feedback loop. When a user clicks on a suggestion, that selection event propagates to listeners that boost the selected word's frequency in the trie. The system gets smarter with every interaction.
Requirements
Functional
suggest(prefix, k)returns up to k autocomplete suggestions for a given prefix- Insert new words with associated frequency scores
- Support multiple ranking strategies: frequency-based, recency-based, trending
- Post-process results through composable decorator layers (dedup, normalization)
- Track user selections and feed them back as frequency boosts
Non-Functional
- O(p + n) prefix search where p is prefix length and n is completions to collect
- O(w) insert where w is word length; prefix sharing keeps memory sublinear in total characters
- Ranking and decoration must be swappable at runtime without modifying core trie logic
- Thread-safe trie operations for concurrent read/write access
Design Decisions
Why not just use a HashMap?
A HashMap mapping every prefix substring to its completions gives O(1) lookup but costs O(w^2) storage per word (every prefix is a key). A trie stores each word once, sharing prefixes automatically. For a dictionary with millions of words sharing common prefixes, the memory difference is enormous. The trie also supports ranked traversal: you walk the subtree and collect results in priority order. A flat map can't do that without precomputing all prefix-to-completions mappings.
How much memory does Flyweight actually save?
Consider inserting 10,000 English words. A naive approach stores each as an independent string: about 50,000 characters total. A trie shares prefixes. "un-" alone is a prefix for hundreds of words but uses only 2 nodes. In practice, tries for English text use 40-60% fewer nodes than total characters across all words. The Flyweight pattern formalizes this: character identity (intrinsic state) is shared in the tree structure, while per-word metadata like frequency (extrinsic state) is stored only at terminal nodes.
What's the case for Strategy over hardcoded sort?
Hardcoding sorted(results, key=lambda x: -x.frequency) inside the trie works for a prototype. But the moment you need recency ranking for a code editor, trending for a news site, or personalized ranking for logged-in users, you're piling conditionals into core search logic. Strategy extracts ranking into its own hierarchy. The trie produces raw candidates; the strategy decides their order. A new ranking algorithm is one new class.
Decorator vs. a simple filter chain?
A filter chain (list of functions) achieves similar composability. Decorator adds structure: each decorator is a class that can hold state (a spell-check dictionary, a personalization model), be independently tested, and be dynamically added or removed. The chain of decorators is also explicit in the object graph. You can inspect and modify it at runtime, which matters when different user contexts need different post-processing pipelines.
Interview Follow-ups
- "How would you distribute the trie across multiple servers?" Shard by first character or by consistent hashing on prefixes. Each shard holds a subtrie. A routing layer directs the prefix query to the right shard. For popular prefixes, replicate those subtries across multiple nodes.
- "How do you handle real-time updates when new trending terms appear?" Use a two-tier system. A small hot trie in memory holds recent high-frequency terms, updated in real time. A larger cold trie holds the full dictionary, rebuilt periodically. Merge results at query time. The hot trie is small enough to rebuild frequently.
- "How would you handle typos in the prefix?" Integrate edit-distance search. For each prefix, also search for words within Levenshtein distance 1 or 2. This is expensive on a raw trie, so use a BK-tree or precomputed fuzzy index alongside the main trie. The SpellCheckDecorator in our design is the hook point for this.
- "How do you scale to billions of queries per day?" Cache the top-k results for the most common prefixes (the top 10,000 prefixes handle 90% of traffic). Use a CDN for single-character prefixes. Rate-limit per user to prevent abuse. The trie itself is read-heavy, so replicate across many read replicas with infrequent write propagation.
Code Implementation
1 from __future__ import annotations
2
3 import time
4 from abc import ABC, abstractmethod
5 from collections import defaultdict
6 from dataclasses import dataclass, field
7
8
9 # ---------------------------------------------------------------------------
10 # Flyweight: TrieNode shares prefix paths across all inserted words
11 # ---------------------------------------------------------------------------
12
13 @dataclass
14 class TrieNode:
15 """Flyweight node. Intrinsic state: position in the tree (character implied
16 by parent edge). Extrinsic state: frequency is stored per-word at the
17 terminal node only."""
18 children: dict[str, TrieNode] = field(default_factory=dict)
19 is_end: bool = False
20 frequency: int = 0
21
22
23 class Trie:
24 """Core data structure. Insert words and collect completions for a prefix."""
25
26 def __init__(self) -> None:
27 self.root = TrieNode()
28
29 def insert(self, word: str, frequency: int = 1) -> None:
30 node = self.root
31 for ch in word:
32 if ch not in node.children:
33 node.children[ch] = TrieNode()
34 node = node.children[ch]
35 node.is_end = True
36 node.frequency += frequency
37
38 def _find_node(self, prefix: str) -> TrieNode | None:
39 node = self.root
40 for ch in prefix:
41 if ch not in node.children:
42 return None
43 node = node.children[ch]
44 return node
45
46 def search_prefix(self, prefix: str, limit: int = 10) -> list[Candidate]:
47 """Return all completions under *prefix* up to *limit*."""
48 start = self._find_node(prefix)
49 if start is None:
50 return []
51 results: list[Candidate] = []
52 self._dfs(start, list(prefix), results, limit)
53 return results
54
55 def _dfs(
56 self,
57 node: TrieNode,
58 path: list[str],
59 results: list[Candidate],
60 limit: int,
61 ) -> None:
62 if len(results) >= limit:
63 return
64 if node.is_end:
65 results.append(Candidate("".join(path), node.frequency))
66 for ch in sorted(node.children):
67 self._dfs(node.children[ch], path + [ch], results, limit)
68 if len(results) >= limit:
69 return
70
71
72 # ---------------------------------------------------------------------------
73 # Candidate value object
74 # ---------------------------------------------------------------------------
75
76 @dataclass
77 class Candidate:
78 word: str
79 frequency: int
80 timestamp: float = 0.0
81
82
83 # ---------------------------------------------------------------------------
84 # Strategy: pluggable ranking algorithms
85 # ---------------------------------------------------------------------------
86
87 class RankingStrategy(ABC):
88 @abstractmethod
89 def rank(self, candidates: list[Candidate]) -> list[Candidate]: ...
90
91
92 class FrequencyRanker(RankingStrategy):
93 """Rank by descending frequency : most popular first."""
94
95 def rank(self, candidates: list[Candidate]) -> list[Candidate]:
96 return sorted(candidates, key=lambda c: -c.frequency)
97
98
99 class RecencyRanker(RankingStrategy):
100 """Rank by most recently selected : freshest first."""
101
102 def __init__(self) -> None:
103 self._timestamps: dict[str, float] = {}
104
105 def record(self, word: str) -> None:
106 self._timestamps[word] = time.time()
107
108 def rank(self, candidates: list[Candidate]) -> list[Candidate]:
109 for c in candidates:
110 c.timestamp = self._timestamps.get(c.word, 0.0)
111 return sorted(candidates, key=lambda c: -c.timestamp)
112
113
114 class TrendingRanker(RankingStrategy):
115 """Combine frequency and recency with a weighted score."""
116
117 def __init__(self, freq_weight: float = 0.6, recency_weight: float = 0.4) -> None:
118 self._fw = freq_weight
119 self._rw = recency_weight
120 self._timestamps: dict[str, float] = {}
121
122 def record(self, word: str) -> None:
123 self._timestamps[word] = time.time()
124
125 def rank(self, candidates: list[Candidate]) -> list[Candidate]:
126 now = time.time()
127 def score(c: Candidate) -> float:
128 ts = self._timestamps.get(c.word, 0.0)
129 recency = 1.0 / (1.0 + now - ts) if ts else 0.0
130 return self._fw * c.frequency + self._rw * recency * 100
131 return sorted(candidates, key=lambda c: -score(c))
132
133
134 # ---------------------------------------------------------------------------
135 # Decorator: composable post-processing layers
136 # ---------------------------------------------------------------------------
137
138 class ResultDecorator(ABC):
139 @abstractmethod
140 def apply(self, results: list[str]) -> list[str]: ...
141
142
143 class DedupDecorator(ResultDecorator):
144 """Remove duplicate suggestions (case-insensitive)."""
145
146 def apply(self, results: list[str]) -> list[str]:
147 seen: set[str] = set()
148 out: list[str] = []
149 for r in results:
150 key = r.lower()
151 if key not in seen:
152 seen.add(key)
153 out.append(r)
154 return out
155
156
157 class CaseNormalizeDecorator(ResultDecorator):
158 """Lowercase all suggestions for uniform display."""
159
160 def apply(self, results: list[str]) -> list[str]:
161 return [r.lower() for r in results]
162
163
164 # ---------------------------------------------------------------------------
165 # Observer: track user selections to feed back into ranking
166 # ---------------------------------------------------------------------------
167
168 class SearchListener(ABC):
169 @abstractmethod
170 def on_selection(self, query: str, selected: str) -> None: ...
171
172
173 class FrequencyBoostListener(SearchListener):
174 """Boost frequency in the trie when a user picks a suggestion."""
175
176 def __init__(self, trie: Trie) -> None:
177 self._trie = trie
178
179 def on_selection(self, query: str, selected: str) -> None:
180 self._trie.insert(selected, frequency=1)
181 print(f" [Observer] Boosted '{selected}' frequency after selection")
182
183
184 # ---------------------------------------------------------------------------
185 # Factory: SuggestionProvider wires everything together
186 # ---------------------------------------------------------------------------
187
188 class SuggestionProvider:
189 """Factory-created provider that combines trie search, ranking strategy,
190 decorator chain, and observer notification."""
191
192 def __init__(
193 self,
194 trie: Trie,
195 strategy: RankingStrategy,
196 decorators: list[ResultDecorator] | None = None,
197 listeners: list[SearchListener] | None = None,
198 ) -> None:
199 self._trie = trie
200 self._strategy = strategy
201 self._decorators = decorators or []
202 self._listeners = listeners or []
203
204 def suggest(self, prefix: str, k: int = 5) -> list[str]:
205 candidates = self._trie.search_prefix(prefix, limit=50)
206 ranked = self._strategy.rank(candidates)[:k]
207 results = [c.word for c in ranked]
208 for decorator in self._decorators:
209 results = decorator.apply(results)
210 return results
211
212 def select(self, query: str, selected: str) -> None:
213 """Notify observers that a user picked a suggestion."""
214 for listener in self._listeners:
215 listener.on_selection(query, selected)
216
217 @staticmethod
218 def create(
219 trie: Trie,
220 context: str = "search",
221 ) -> SuggestionProvider:
222 """Factory method: build the right provider for the context."""
223 if context == "code_editor":
224 strategy: RankingStrategy = RecencyRanker()
225 decorators: list[ResultDecorator] = [DedupDecorator()]
226 else:
227 strategy = FrequencyRanker()
228 decorators = [DedupDecorator(), CaseNormalizeDecorator()]
229 listener = FrequencyBoostListener(trie)
230 return SuggestionProvider(trie, strategy, decorators, [listener])
231
232
233 # ---------------------------------------------------------------------------
234 # Demo
235 # ---------------------------------------------------------------------------
236
237 if __name__ == "__main__":
238 trie = Trie()
239 words = [
240 ("apple", 50), ("application", 30), ("applied", 20),
241 ("app", 80), ("appetite", 10), ("banana", 40),
242 ("band", 35), ("bandwidth", 25), ("ban", 15),
243 ]
244 for word, freq in words:
245 trie.insert(word, freq)
246
247 print("=== Frequency-Ranked Search (default) ===")
248 provider = SuggestionProvider.create(trie, context="search")
249 results = provider.suggest("app", k=5)
250 print(f" prefix='app' -> {results}")
251
252 results = provider.suggest("ban", k=3)
253 print(f" prefix='ban' -> {results}")
254
255 print("\n=== Recency-Ranked Search (code editor) ===")
256 editor_provider = SuggestionProvider.create(trie, context="code_editor")
257 recency_ranker: RecencyRanker = editor_provider._strategy # type: ignore
258 recency_ranker.record("application")
259 recency_ranker.record("applied")
260 results = editor_provider.suggest("app", k=5)
261 print(f" prefix='app' -> {results}")
262
263 print("\n=== Decorator Effects ===")
264 trie.insert("Apple", 5) # duplicate with different case
265 raw_candidates = trie.search_prefix("app", limit=10)
266 raw_words = [c.word for c in raw_candidates]
267 print(f" Raw results: {raw_words}")
268
269 dedup = DedupDecorator()
270 deduped = dedup.apply(raw_words)
271 print(f" After DedupDecorator: {deduped}")
272
273 normalizer = CaseNormalizeDecorator()
274 normalized = normalizer.apply(deduped)
275 print(f" After CaseNormalizeDecorator: {normalized}")
276
277 print("\n=== Observer Feedback ===")
278 provider.select("app", "application")
279 node = trie._find_node("application")
280 print(f" 'application' frequency after selection: {node.frequency if node else 'N/A'}")
281
282 results_after = provider.suggest("app", k=5)
283 print(f" prefix='app' after boost -> {results_after}")
284
285 print("\n=== Trending Ranker ===")
286 trending = TrendingRanker()
287 trending.record("appetite")
288 trending_provider = SuggestionProvider(
289 trie, trending, [DedupDecorator(), CaseNormalizeDecorator()]
290 )
291 results = trending_provider.suggest("app", k=5)
292 print(f" prefix='app' (trending) -> {results}")
293
294 print("\nAll operations completed successfully.")Common Mistakes
- ✗Storing full strings at every node instead of sharing prefixes. That defeats Flyweight and wastes memory.
- ✗Hardcoding ranking logic inside the Trie. Violates SRP and can't adapt to different contexts.
- ✗Rebuilding the entire result list through all decorators on every keystroke. You need caching or debounce.
- ✗Not limiting result count early. Searching millions of completions and truncating at the end is wasteful.
Key Points
- ✓Flyweight: Trie nodes share prefix paths. 'apple' and 'application' share 4 nodes. The character (intrinsic state) is shared; frequency per context (extrinsic) is external.
- ✓Strategy: the ranking algorithm is hot-swappable. Search bar uses frequency, code editor uses recency.
- ✓Decorator: post-processing layers compose. Each wraps the result list without modifying the core search.
- ✓Observer: when a user clicks a suggestion, that feeds back as a frequency boost signal.