Typeahead Autocomplete
Trie with Flyweight shared prefix nodes, swappable ranking strategies, and Decorator layers for result post-processing. Where data structures meet design patterns.
Key Abstractions
Flyweight: shared prefix nodes with intrinsic char, extrinsic frequency
Manages the node tree, insert/search operations
Interface: FrequencyRanker, RecencyRanker, TrendingRanker
Wraps results: SpellCheckDecorator, PersonalizationDecorator, DedupDecorator
Factory creates the right provider per context
Observer: tracks user selections to feed back into ranking
Class Diagram
How It Works
Typeahead autocomplete is the system behind every search box that finishes your sentences. You type "app" and instantly see "apple", "application", "appetite". The core data structure is a trie: a tree where each edge represents a character and shared prefixes share the same path. "apple" and "application" share the nodes for a-p-p-l, then diverge. That's the Flyweight pattern in its most natural form. The character at each node (intrinsic state) is shared across every word that passes through it. Per-word metadata like frequency (extrinsic state) lives at terminal nodes or in external maps.
But a trie alone only gives you completions. It doesn't tell you which ones matter. The Strategy pattern fills that gap. A search bar ranks by frequency (show the most popular completions). A code editor ranks by recency (show what the developer typed last). A trending feature blends both. The ranking algorithm is a swappable strategy: you can change it without touching a single line of trie code.
Raw results from the ranker often need post-processing. Maybe you want deduplication (case-insensitive). Maybe you want spell-check suggestions mixed in. Maybe personalization reorders based on user history. Each of these is a Decorator that wraps the result list, transforms it, and passes it along. Stack them in any order. Add new ones without modifying existing code.
The Observer pattern closes the feedback loop. When a user clicks on a suggestion, that selection event propagates to listeners that boost the selected word's frequency in the trie. The system gets smarter with every interaction.
Requirements
Functional
suggest(prefix, k)returns up to k autocomplete suggestions for a given prefix- Insert new words with associated frequency scores
- Support multiple ranking strategies: frequency-based, recency-based, trending
- Post-process results through composable decorator layers (dedup, normalization)
- Track user selections and feed them back as frequency boosts
Non-Functional
- O(p + n) prefix search where p is prefix length and n is completions to collect
- O(w) insert where w is word length; prefix sharing keeps memory sublinear in total characters
- Ranking and decoration must be swappable at runtime without modifying core trie logic
- Thread-safe trie operations for concurrent read/write access
Design Decisions
Why not just use a HashMap?
A HashMap mapping every prefix substring to its completions gives O(1) lookup but costs O(w^2) storage per word (every prefix is a key). A trie stores each word once, sharing prefixes automatically. For a dictionary with millions of words sharing common prefixes, the memory difference is enormous. The trie also supports ranked traversal: you walk the subtree and collect results in priority order. A flat map can't do that without precomputing all prefix-to-completions mappings.
How much memory does Flyweight actually save?
Consider inserting 10,000 English words. A naive approach stores each as an independent string: about 50,000 characters total. A trie shares prefixes. "un-" alone is a prefix for hundreds of words but uses only 2 nodes. In practice, tries for English text use 40-60% fewer nodes than total characters across all words. The Flyweight pattern formalizes this: character identity (intrinsic state) is shared in the tree structure, while per-word metadata like frequency (extrinsic state) is stored only at terminal nodes.
What's the case for Strategy over hardcoded sort?
Hardcoding sorted(results, key=lambda x: -x.frequency) inside the trie works for a prototype. But the moment you need recency ranking for a code editor, trending for a news site, or personalized ranking for logged-in users, you're piling conditionals into core search logic. Strategy extracts ranking into its own hierarchy. The trie produces raw candidates; the strategy decides their order. A new ranking algorithm is one new class.
Decorator vs. a simple filter chain?
A filter chain (list of functions) achieves similar composability. Decorator adds structure: each decorator is a class that can hold state (a spell-check dictionary, a personalization model), be independently tested, and be dynamically added or removed. The chain of decorators is also explicit in the object graph. You can inspect and modify it at runtime, which matters when different user contexts need different post-processing pipelines.
Interview Follow-ups
- "How would you distribute the trie across multiple servers?" Shard by first character or by consistent hashing on prefixes. Each shard holds a subtrie. A routing layer directs the prefix query to the right shard. For popular prefixes, replicate those subtries across multiple nodes.
- "How do you handle real-time updates when new trending terms appear?" Use a two-tier system. A small hot trie in memory holds recent high-frequency terms, updated in real time. A larger cold trie holds the full dictionary, rebuilt periodically. Merge results at query time. The hot trie is small enough to rebuild frequently.
- "How would you handle typos in the prefix?" Integrate edit-distance search. For each prefix, also search for words within Levenshtein distance 1 or 2. This is expensive on a raw trie, so use a BK-tree or precomputed fuzzy index alongside the main trie. The SpellCheckDecorator in our design is the hook point for this.
- "How do you scale to billions of queries per day?" Cache the top-k results for the most common prefixes (the top 10,000 prefixes handle 90% of traffic). Use a CDN for single-character prefixes. Rate-limit per user to prevent abuse. The trie itself is read-heavy, so replicate across many read replicas with infrequent write propagation.
Code Implementation
from __future__ import annotations
import time
from abc import ABC, abstractmethod
from collections import defaultdict
from dataclasses import dataclass, field
# ---------------------------------------------------------------------------
# Flyweight: TrieNode shares prefix paths across all inserted words
# ---------------------------------------------------------------------------
@dataclass
class TrieNode:
"""Flyweight node. Intrinsic state: position in the tree (character implied
by parent edge). Extrinsic state: frequency is stored per-word at the
terminal node only."""
children: dict[str, TrieNode] = field(default_factory=dict)
is_end: bool = False
frequency: int = 0
class Trie:
"""Core data structure. Insert words and collect completions for a prefix."""
def __init__(self) -> None:
self.root = TrieNode()
def insert(self, word: str, frequency: int = 1) -> None:
node = self.root
for ch in word:
if ch not in node.children:
node.children[ch] = TrieNode()
node = node.children[ch]
node.is_end = True
node.frequency += frequency
def _find_node(self, prefix: str) -> TrieNode | None:
node = self.root
for ch in prefix:
if ch not in node.children:
return None
node = node.children[ch]
return node
def search_prefix(self, prefix: str, limit: int = 10) -> list[Candidate]:
"""Return all completions under *prefix* up to *limit*."""
start = self._find_node(prefix)
if start is None:
return []
results: list[Candidate] = []
self._dfs(start, list(prefix), results, limit)
return results
def _dfs(
self,
node: TrieNode,
path: list[str],
results: list[Candidate],
limit: int,
) -> None:
if len(results) >= limit:
return
if node.is_end:
results.append(Candidate("".join(path), node.frequency))
for ch in sorted(node.children):
self._dfs(node.children[ch], path + [ch], results, limit)
if len(results) >= limit:
return
# ---------------------------------------------------------------------------
# Candidate value object
# ---------------------------------------------------------------------------
@dataclass
class Candidate:
word: str
frequency: int
timestamp: float = 0.0
# ---------------------------------------------------------------------------
# Strategy: pluggable ranking algorithms
# ---------------------------------------------------------------------------
class RankingStrategy(ABC):
@abstractmethod
def rank(self, candidates: list[Candidate]) -> list[Candidate]: ...
class FrequencyRanker(RankingStrategy):
"""Rank by descending frequency : most popular first."""
def rank(self, candidates: list[Candidate]) -> list[Candidate]:
return sorted(candidates, key=lambda c: -c.frequency)
class RecencyRanker(RankingStrategy):
"""Rank by most recently selected : freshest first."""
def __init__(self) -> None:
self._timestamps: dict[str, float] = {}
def record(self, word: str) -> None:
self._timestamps[word] = time.time()
def rank(self, candidates: list[Candidate]) -> list[Candidate]:
for c in candidates:
c.timestamp = self._timestamps.get(c.word, 0.0)
return sorted(candidates, key=lambda c: -c.timestamp)
class TrendingRanker(RankingStrategy):
"""Combine frequency and recency with a weighted score."""
def __init__(self, freq_weight: float = 0.6, recency_weight: float = 0.4) -> None:
self._fw = freq_weight
self._rw = recency_weight
self._timestamps: dict[str, float] = {}
def record(self, word: str) -> None:
self._timestamps[word] = time.time()
def rank(self, candidates: list[Candidate]) -> list[Candidate]:
now = time.time()
def score(c: Candidate) -> float:
ts = self._timestamps.get(c.word, 0.0)
recency = 1.0 / (1.0 + now - ts) if ts else 0.0
return self._fw * c.frequency + self._rw * recency * 100
return sorted(candidates, key=lambda c: -score(c))
# ---------------------------------------------------------------------------
# Decorator: composable post-processing layers
# ---------------------------------------------------------------------------
class ResultDecorator(ABC):
@abstractmethod
def apply(self, results: list[str]) -> list[str]: ...
class DedupDecorator(ResultDecorator):
"""Remove duplicate suggestions (case-insensitive)."""
def apply(self, results: list[str]) -> list[str]:
seen: set[str] = set()
out: list[str] = []
for r in results:
key = r.lower()
if key not in seen:
seen.add(key)
out.append(r)
return out
class CaseNormalizeDecorator(ResultDecorator):
"""Lowercase all suggestions for uniform display."""
def apply(self, results: list[str]) -> list[str]:
return [r.lower() for r in results]
# ---------------------------------------------------------------------------
# Observer: track user selections to feed back into ranking
# ---------------------------------------------------------------------------
class SearchListener(ABC):
@abstractmethod
def on_selection(self, query: str, selected: str) -> None: ...
class FrequencyBoostListener(SearchListener):
"""Boost frequency in the trie when a user picks a suggestion."""
def __init__(self, trie: Trie) -> None:
self._trie = trie
def on_selection(self, query: str, selected: str) -> None:
self._trie.insert(selected, frequency=1)
print(f" [Observer] Boosted '{selected}' frequency after selection")
# ---------------------------------------------------------------------------
# Factory: SuggestionProvider wires everything together
# ---------------------------------------------------------------------------
class SuggestionProvider:
"""Factory-created provider that combines trie search, ranking strategy,
decorator chain, and observer notification."""
def __init__(
self,
trie: Trie,
strategy: RankingStrategy,
decorators: list[ResultDecorator] | None = None,
listeners: list[SearchListener] | None = None,
) -> None:
self._trie = trie
self._strategy = strategy
self._decorators = decorators or []
self._listeners = listeners or []
def suggest(self, prefix: str, k: int = 5) -> list[str]:
candidates = self._trie.search_prefix(prefix, limit=50)
ranked = self._strategy.rank(candidates)[:k]
results = [c.word for c in ranked]
for decorator in self._decorators:
results = decorator.apply(results)
return results
def select(self, query: str, selected: str) -> None:
"""Notify observers that a user picked a suggestion."""
for listener in self._listeners:
listener.on_selection(query, selected)
@staticmethod
def create(
trie: Trie,
context: str = "search",
) -> SuggestionProvider:
"""Factory method: build the right provider for the context."""
if context == "code_editor":
strategy: RankingStrategy = RecencyRanker()
decorators: list[ResultDecorator] = [DedupDecorator()]
else:
strategy = FrequencyRanker()
decorators = [DedupDecorator(), CaseNormalizeDecorator()]
listener = FrequencyBoostListener(trie)
return SuggestionProvider(trie, strategy, decorators, [listener])
# ---------------------------------------------------------------------------
# Demo
# ---------------------------------------------------------------------------
if __name__ == "__main__":
trie = Trie()
words = [
("apple", 50), ("application", 30), ("applied", 20),
("app", 80), ("appetite", 10), ("banana", 40),
("band", 35), ("bandwidth", 25), ("ban", 15),
]
for word, freq in words:
trie.insert(word, freq)
print("=== Frequency-Ranked Search (default) ===")
provider = SuggestionProvider.create(trie, context="search")
results = provider.suggest("app", k=5)
print(f" prefix='app' -> {results}")
results = provider.suggest("ban", k=3)
print(f" prefix='ban' -> {results}")
print("\n=== Recency-Ranked Search (code editor) ===")
editor_provider = SuggestionProvider.create(trie, context="code_editor")
recency_ranker: RecencyRanker = editor_provider._strategy # type: ignore
recency_ranker.record("application")
recency_ranker.record("applied")
results = editor_provider.suggest("app", k=5)
print(f" prefix='app' -> {results}")
print("\n=== Decorator Effects ===")
trie.insert("Apple", 5) # duplicate with different case
raw_candidates = trie.search_prefix("app", limit=10)
raw_words = [c.word for c in raw_candidates]
print(f" Raw results: {raw_words}")
dedup = DedupDecorator()
deduped = dedup.apply(raw_words)
print(f" After DedupDecorator: {deduped}")
normalizer = CaseNormalizeDecorator()
normalized = normalizer.apply(deduped)
print(f" After CaseNormalizeDecorator: {normalized}")
print("\n=== Observer Feedback ===")
provider.select("app", "application")
node = trie._find_node("application")
print(f" 'application' frequency after selection: {node.frequency if node else 'N/A'}")
results_after = provider.suggest("app", k=5)
print(f" prefix='app' after boost -> {results_after}")
print("\n=== Trending Ranker ===")
trending = TrendingRanker()
trending.record("appetite")
trending_provider = SuggestionProvider(
trie, trending, [DedupDecorator(), CaseNormalizeDecorator()]
)
results = trending_provider.suggest("app", k=5)
print(f" prefix='app' (trending) -> {results}")
print("\nAll operations completed successfully.")Common Mistakes
- ✗Storing full strings at every node instead of sharing prefixes. That defeats Flyweight and wastes memory.
- ✗Hardcoding ranking logic inside the Trie. Violates SRP and can't adapt to different contexts.
- ✗Rebuilding the entire result list through all decorators on every keystroke. You need caching or debounce.
- ✗Not limiting result count early. Searching millions of completions and truncating at the end is wasteful.
Key Points
- ✓Flyweight: Trie nodes share prefix paths. 'apple' and 'application' share 4 nodes. The character (intrinsic state) is shared; frequency per context (extrinsic) is external.
- ✓Strategy: the ranking algorithm is hot-swappable. Search bar uses frequency, code editor uses recency.
- ✓Decorator: post-processing layers compose. Each wraps the result list without modifying the core search.
- ✓Observer: when a user clicks a suggestion, that feeds back as a frequency boost signal.