Platform ScaleProblem 8 of 29

Platform ScaleIntermediate

Typeahead Autocomplete

Flyweight PatternStrategy PatternDecorator PatternFactory PatternObserver Pattern

Trie with Flyweight shared prefix nodes, swappable ranking strategies, and Decorator layers for result post-processing. Where data structures meet design patterns.

Key Abstractions

TrieNode

Flyweight: shared prefix nodes with intrinsic char, extrinsic frequency

Trie

Manages the node tree, insert/search operations

RankingStrategy

Interface: FrequencyRanker, RecencyRanker, TrendingRanker

ResultDecorator

Wraps results: SpellCheckDecorator, PersonalizationDecorator, DedupDecorator

SuggestionProvider

Factory creates the right provider per context

SearchListener

Observer: tracks user selections to feed back into ranking

Class Diagram

How It Works

Typeahead autocomplete is the system behind every search box that finishes your sentences. You type "app" and instantly see "apple", "application", "appetite". The core data structure is a trie: a tree where each edge represents a character and shared prefixes share the same path. "apple" and "application" share the nodes for a-p-p-l, then diverge. That's the Flyweight pattern in its most natural form. The character at each node (intrinsic state) is shared across every word that passes through it. Per-word metadata like frequency (extrinsic state) lives at terminal nodes or in external maps.

But a trie alone only gives you completions. It doesn't tell you which ones matter. The Strategy pattern fills that gap. A search bar ranks by frequency (show the most popular completions). A code editor ranks by recency (show what the developer typed last). A trending feature blends both. The ranking algorithm is a swappable strategy: you can change it without touching a single line of trie code.

Raw results from the ranker often need post-processing. Maybe you want deduplication (case-insensitive). Maybe you want spell-check suggestions mixed in. Maybe personalization reorders based on user history. Each of these is a Decorator that wraps the result list, transforms it, and passes it along. Stack them in any order. Add new ones without modifying existing code.

The Observer pattern closes the feedback loop. When a user clicks on a suggestion, that selection event propagates to listeners that boost the selected word's frequency in the trie. The system gets smarter with every interaction.

Requirements

Functional

suggest(prefix, k) returns up to k autocomplete suggestions for a given prefix
Insert new words with associated frequency scores
Support multiple ranking strategies: frequency-based, recency-based, trending
Post-process results through composable decorator layers (dedup, normalization)
Track user selections and feed them back as frequency boosts

Non-Functional

O(p + n) prefix search where p is prefix length and n is completions to collect
O(w) insert where w is word length; prefix sharing keeps memory sublinear in total characters
Ranking and decoration must be swappable at runtime without modifying core trie logic
Thread-safe trie operations for concurrent read/write access

Design Decisions

Why not just use a HashMap?

A HashMap mapping every prefix substring to its completions gives O(1) lookup but costs O(w^2) storage per word (every prefix is a key). A trie stores each word once, sharing prefixes automatically. For a dictionary with millions of words sharing common prefixes, the memory difference is enormous. The trie also supports ranked traversal: you walk the subtree and collect results in priority order. A flat map can't do that without precomputing all prefix-to-completions mappings.

How much memory does Flyweight actually save?

Consider inserting 10,000 English words. A naive approach stores each as an independent string: about 50,000 characters total. A trie shares prefixes. "un-" alone is a prefix for hundreds of words but uses only 2 nodes. In practice, tries for English text use 40-60% fewer nodes than total characters across all words. The Flyweight pattern formalizes this: character identity (intrinsic state) is shared in the tree structure, while per-word metadata like frequency (extrinsic state) is stored only at terminal nodes.

What's the case for Strategy over hardcoded sort?

Hardcoding sorted(results, key=lambda x: -x.frequency) inside the trie works for a prototype. But the moment you need recency ranking for a code editor, trending for a news site, or personalized ranking for logged-in users, you're piling conditionals into core search logic. Strategy extracts ranking into its own hierarchy. The trie produces raw candidates; the strategy decides their order. A new ranking algorithm is one new class.

Decorator vs. a simple filter chain?

A filter chain (list of functions) achieves similar composability. Decorator adds structure: each decorator is a class that can hold state (a spell-check dictionary, a personalization model), be independently tested, and be dynamically added or removed. The chain of decorators is also explicit in the object graph. You can inspect and modify it at runtime, which matters when different user contexts need different post-processing pipelines.

Interview Follow-ups

"How would you distribute the trie across multiple servers?" Shard by first character or by consistent hashing on prefixes. Each shard holds a subtrie. A routing layer directs the prefix query to the right shard. For popular prefixes, replicate those subtries across multiple nodes.
"How do you handle real-time updates when new trending terms appear?" Use a two-tier system. A small hot trie in memory holds recent high-frequency terms, updated in real time. A larger cold trie holds the full dictionary, rebuilt periodically. Merge results at query time. The hot trie is small enough to rebuild frequently.
"How would you handle typos in the prefix?" Integrate edit-distance search. For each prefix, also search for words within Levenshtein distance 1 or 2. This is expensive on a raw trie, so use a BK-tree or precomputed fuzzy index alongside the main trie. The SpellCheckDecorator in our design is the hook point for this.
"How do you scale to billions of queries per day?" Cache the top-k results for the most common prefixes (the top 10,000 prefixes handle 90% of traffic). Use a CDN for single-character prefixes. Rate-limit per user to prevent abuse. The trie itself is read-heavy, so replicate across many read replicas with infrequent write propagation.

Code Implementation

  1  from __future__ import annotations
  2  
  3  import time
  4  from abc import ABC, abstractmethod
  5  from collections import defaultdict
  6  from dataclasses import dataclass, field
  7  
  8  
  9  # ---------------------------------------------------------------------------
 10  # Flyweight: TrieNode shares prefix paths across all inserted words
 11  # ---------------------------------------------------------------------------
 12  
 13  @dataclass
 14  class TrieNode:
 15      """Flyweight node. Intrinsic state: position in the tree (character implied
 16      by parent edge). Extrinsic state: frequency is stored per-word at the
 17      terminal node only."""
 18      children: dict[str, TrieNode] = field(default_factory=dict)
 19      is_end: bool = False
 20      frequency: int = 0
 21  
 22  
 23  class Trie:
 24      """Core data structure. Insert words and collect completions for a prefix."""
 25  
 26      def __init__(self) -> None:
 27          self.root = TrieNode()
 28  
 29      def insert(self, word: str, frequency: int = 1) -> None:
 30          node = self.root
 31          for ch in word:
 32              if ch not in node.children:
 33                  node.children[ch] = TrieNode()
 34              node = node.children[ch]
 35          node.is_end = True
 36          node.frequency += frequency
 37  
 38      def _find_node(self, prefix: str) -> TrieNode | None:
 39          node = self.root
 40          for ch in prefix:
 41              if ch not in node.children:
 42                  return None
 43              node = node.children[ch]
 44          return node
 45  
 46      def search_prefix(self, prefix: str, limit: int = 10) -> list[Candidate]:
 47          """Return all completions under *prefix* up to *limit*."""
 48          start = self._find_node(prefix)
 49          if start is None:
 50              return []
 51          results: list[Candidate] = []
 52          self._dfs(start, list(prefix), results, limit)
 53          return results
 54  
 55      def _dfs(
 56          self,
 57          node: TrieNode,
 58          path: list[str],
 59          results: list[Candidate],
 60          limit: int,
 61      ) -> None:
 62          if len(results) >= limit:
 63              return
 64          if node.is_end:
 65              results.append(Candidate("".join(path), node.frequency))
 66          for ch in sorted(node.children):
 67              self._dfs(node.children[ch], path + [ch], results, limit)
 68              if len(results) >= limit:
 69                  return
 70  
 71  
 72  # ---------------------------------------------------------------------------
 73  # Candidate value object
 74  # ---------------------------------------------------------------------------
 75  
 76  @dataclass
 77  class Candidate:
 78      word: str
 79      frequency: int
 80      timestamp: float = 0.0
 81  
 82  
 83  # ---------------------------------------------------------------------------
 84  # Strategy: pluggable ranking algorithms
 85  # ---------------------------------------------------------------------------
 86  
 87  class RankingStrategy(ABC):
 88      @abstractmethod
 89      def rank(self, candidates: list[Candidate]) -> list[Candidate]: ...
 90  
 91  
 92  class FrequencyRanker(RankingStrategy):
 93      """Rank by descending frequency : most popular first."""
 94  
 95      def rank(self, candidates: list[Candidate]) -> list[Candidate]:
 96          return sorted(candidates, key=lambda c: -c.frequency)
 97  
 98  
 99  class RecencyRanker(RankingStrategy):
100      """Rank by most recently selected : freshest first."""
101  
102      def __init__(self) -> None:
103          self._timestamps: dict[str, float] = {}
104  
105      def record(self, word: str) -> None:
106          self._timestamps[word] = time.time()
107  
108      def rank(self, candidates: list[Candidate]) -> list[Candidate]:
109          for c in candidates:
110              c.timestamp = self._timestamps.get(c.word, 0.0)
111          return sorted(candidates, key=lambda c: -c.timestamp)
112  
113  
114  class TrendingRanker(RankingStrategy):
115      """Combine frequency and recency with a weighted score."""
116  
117      def __init__(self, freq_weight: float = 0.6, recency_weight: float = 0.4) -> None:
118          self._fw = freq_weight
119          self._rw = recency_weight
120          self._timestamps: dict[str, float] = {}
121  
122      def record(self, word: str) -> None:
123          self._timestamps[word] = time.time()
124  
125      def rank(self, candidates: list[Candidate]) -> list[Candidate]:
126          now = time.time()
127          def score(c: Candidate) -> float:
128              ts = self._timestamps.get(c.word, 0.0)
129              recency = 1.0 / (1.0 + now - ts) if ts else 0.0
130              return self._fw * c.frequency + self._rw * recency * 100
131          return sorted(candidates, key=lambda c: -score(c))
132  
133  
134  # ---------------------------------------------------------------------------
135  # Decorator: composable post-processing layers
136  # ---------------------------------------------------------------------------
137  
138  class ResultDecorator(ABC):
139      @abstractmethod
140      def apply(self, results: list[str]) -> list[str]: ...
141  
142  
143  class DedupDecorator(ResultDecorator):
144      """Remove duplicate suggestions (case-insensitive)."""
145  
146      def apply(self, results: list[str]) -> list[str]:
147          seen: set[str] = set()
148          out: list[str] = []
149          for r in results:
150              key = r.lower()
151              if key not in seen:
152                  seen.add(key)
153                  out.append(r)
154          return out
155  
156  
157  class CaseNormalizeDecorator(ResultDecorator):
158      """Lowercase all suggestions for uniform display."""
159  
160      def apply(self, results: list[str]) -> list[str]:
161          return [r.lower() for r in results]
162  
163  
164  # ---------------------------------------------------------------------------
165  # Observer: track user selections to feed back into ranking
166  # ---------------------------------------------------------------------------
167  
168  class SearchListener(ABC):
169      @abstractmethod
170      def on_selection(self, query: str, selected: str) -> None: ...
171  
172  
173  class FrequencyBoostListener(SearchListener):
174      """Boost frequency in the trie when a user picks a suggestion."""
175  
176      def __init__(self, trie: Trie) -> None:
177          self._trie = trie
178  
179      def on_selection(self, query: str, selected: str) -> None:
180          self._trie.insert(selected, frequency=1)
181          print(f"  [Observer] Boosted '{selected}' frequency after selection")
182  
183  
184  # ---------------------------------------------------------------------------
185  # Factory: SuggestionProvider wires everything together
186  # ---------------------------------------------------------------------------
187  
188  class SuggestionProvider:
189      """Factory-created provider that combines trie search, ranking strategy,
190      decorator chain, and observer notification."""
191  
192      def __init__(
193          self,
194          trie: Trie,
195          strategy: RankingStrategy,
196          decorators: list[ResultDecorator] | None = None,
197          listeners: list[SearchListener] | None = None,
198      ) -> None:
199          self._trie = trie
200          self._strategy = strategy
201          self._decorators = decorators or []
202          self._listeners = listeners or []
203  
204      def suggest(self, prefix: str, k: int = 5) -> list[str]:
205          candidates = self._trie.search_prefix(prefix, limit=50)
206          ranked = self._strategy.rank(candidates)[:k]
207          results = [c.word for c in ranked]
208          for decorator in self._decorators:
209              results = decorator.apply(results)
210          return results
211  
212      def select(self, query: str, selected: str) -> None:
213          """Notify observers that a user picked a suggestion."""
214          for listener in self._listeners:
215              listener.on_selection(query, selected)
216  
217      @staticmethod
218      def create(
219          trie: Trie,
220          context: str = "search",
221      ) -> SuggestionProvider:
222          """Factory method: build the right provider for the context."""
223          if context == "code_editor":
224              strategy: RankingStrategy = RecencyRanker()
225              decorators: list[ResultDecorator] = [DedupDecorator()]
226          else:
227              strategy = FrequencyRanker()
228              decorators = [DedupDecorator(), CaseNormalizeDecorator()]
229          listener = FrequencyBoostListener(trie)
230          return SuggestionProvider(trie, strategy, decorators, [listener])
231  
232  
233  # ---------------------------------------------------------------------------
234  # Demo
235  # ---------------------------------------------------------------------------
236  
237  if __name__ == "__main__":
238      trie = Trie()
239      words = [
240          ("apple", 50), ("application", 30), ("applied", 20),
241          ("app", 80), ("appetite", 10), ("banana", 40),
242          ("band", 35), ("bandwidth", 25), ("ban", 15),
243      ]
244      for word, freq in words:
245          trie.insert(word, freq)
246  
247      print("=== Frequency-Ranked Search (default) ===")
248      provider = SuggestionProvider.create(trie, context="search")
249      results = provider.suggest("app", k=5)
250      print(f"  prefix='app' -> {results}")
251  
252      results = provider.suggest("ban", k=3)
253      print(f"  prefix='ban' -> {results}")
254  
255      print("\n=== Recency-Ranked Search (code editor) ===")
256      editor_provider = SuggestionProvider.create(trie, context="code_editor")
257      recency_ranker: RecencyRanker = editor_provider._strategy  # type: ignore
258      recency_ranker.record("application")
259      recency_ranker.record("applied")
260      results = editor_provider.suggest("app", k=5)
261      print(f"  prefix='app' -> {results}")
262  
263      print("\n=== Decorator Effects ===")
264      trie.insert("Apple", 5)  # duplicate with different case
265      raw_candidates = trie.search_prefix("app", limit=10)
266      raw_words = [c.word for c in raw_candidates]
267      print(f"  Raw results: {raw_words}")
268  
269      dedup = DedupDecorator()
270      deduped = dedup.apply(raw_words)
271      print(f"  After DedupDecorator: {deduped}")
272  
273      normalizer = CaseNormalizeDecorator()
274      normalized = normalizer.apply(deduped)
275      print(f"  After CaseNormalizeDecorator: {normalized}")
276  
277      print("\n=== Observer Feedback ===")
278      provider.select("app", "application")
279      node = trie._find_node("application")
280      print(f"  'application' frequency after selection: {node.frequency if node else 'N/A'}")
281  
282      results_after = provider.suggest("app", k=5)
283      print(f"  prefix='app' after boost -> {results_after}")
284  
285      print("\n=== Trending Ranker ===")
286      trending = TrendingRanker()
287      trending.record("appetite")
288      trending_provider = SuggestionProvider(
289          trie, trending, [DedupDecorator(), CaseNormalizeDecorator()]
290      )
291      results = trending_provider.suggest("app", k=5)
292      print(f"  prefix='app' (trending) -> {results}")
293  
294      print("\nAll operations completed successfully.")

Common Mistakes

✗Storing full strings at every node instead of sharing prefixes. That defeats Flyweight and wastes memory.
✗Hardcoding ranking logic inside the Trie. Violates SRP and can't adapt to different contexts.
✗Rebuilding the entire result list through all decorators on every keystroke. You need caching or debounce.
✗Not limiting result count early. Searching millions of completions and truncating at the end is wasteful.

Key Points

✓Flyweight: Trie nodes share prefix paths. 'apple' and 'application' share 4 nodes. The character (intrinsic state) is shared; frequency per context (extrinsic) is external.
✓Strategy: the ranking algorithm is hot-swappable. Search bar uses frequency, code editor uses recency.
✓Decorator: post-processing layers compose. Each wraps the result list without modifying the core search.
✓Observer: when a user clicks a suggestion, that feeds back as a frequency boost signal.

Related Designs