String Matching (KMP)
The Intuition
Imagine you're reading a long novel, scanning for the phrase "abracadabra." You start comparing from page one, letter by letter. At some point you've matched "abracadab" (nine letters!) and then the next letter is wrong. Brute force says go back to where you started and try the next position. But think about what you've already seen. Those nine matched letters "abracadab" end with "ab," which is also how "abracadabra" begins. You don't need to start over from scratch. You can jump ahead and continue matching from the third character of your search phrase.
That's KMP's entire insight. When a mismatch happens after a partial match, some suffix of what you've already matched might be a prefix of the pattern. You can skip ahead to where that partial match could continue instead of rewinding the text. You never backtrack through the text. Every character in the text gets looked at once.
The failure function (also called the partial match table) precomputes these fallback positions. For each position i in your pattern, it answers: "if I fail at position i, what's the longest proper prefix of the pattern that matches a suffix of what I've matched so far?" Building this table is itself a pattern-matching problem, matching the pattern against itself. It takes O(m) time, where m is the pattern length, and then the actual search runs in O(n) time.
Picture the failure table as a set of bookmarks in your search phrase. When you stumble, instead of going back to page one, you flip to the right bookmark and keep reading forward. Every position in the pattern has at most one bookmark, and following bookmarks always moves you backward in the pattern but never backward in the text. That combination is why KMP guarantees linear time no matter what the input looks like, even adversarial cases like searching for "aaaaab" in "aaaaaaaaaa."
Substring search, pattern detection in strings, repeated pattern identification, plagiarism detection, or any problem requiring efficient text search without backtracking.
Pattern has complex regex features (use regex engine), need approximate/fuzzy matching (use edit distance), or searching for multiple patterns simultaneously (use Aho-Corasick).
Variations:
- KMP (Knuth-Morris-Pratt): Precomputes a failure table for guaranteed O(n + m) worst-case performance.
- Rabin-Karp: Uses rolling hash for average-case O(n + m); supports multi-pattern matching efficiently.
- Z-Algorithm: Computes the Z-array for each position, useful for pattern matching and string analysis.
- Empty pattern or empty text should be handled gracefully
- Pattern longer than the text returns no matches
- Pattern and text are identical (single full match)
- All characters in text and pattern are the same (e.g., "aaaa" in "aaaaaa")
- Single-character pattern or single-character text
- Pattern appears at the very beginning or very end of the text
- Overlapping matches (e.g., "aa" in "aaaa" yields indices 0, 1, 2)
Key Points
- •KMP builds a failure function (partial match table) to skip redundant comparisons
- •Rabin-Karp uses rolling hash for average-case O(n + m) multi-pattern matching
- •KMP guarantees worst-case O(n + m) with no backtracking on the text
- •The failure function represents the longest proper prefix that is also a suffix
- •Rabin-Karp can degrade to O(n * m) with many hash collisions
Code Template
1 def build_kmp_table(pattern):
2 """Build the KMP partial match (failure) table."""
3 m = len(pattern)
4 table = [0] * m
5 length = 0
6 i = 1
7
8 while i < m:
9 if pattern[i] == pattern[length]:
10 length += 1
11 table[i] = length
12 i += 1
13 elif length > 0:
14 length = table[length - 1]
15 else:
16 table[i] = 0
17 i += 1
18
19 return table
20
21 def kmp_search(text, pattern):
22 """Find all occurrences of pattern in text using KMP."""
23 n, m = len(text), len(pattern)
24 if m == 0:
25 return [0]
26
27 table = build_kmp_table(pattern)
28 results = []
29 j = 0 # pointer in pattern
30
31 for i in range(n):
32 while j > 0 and text[i] != pattern[j]:
33 j = table[j - 1]
34 if text[i] == pattern[j]:
35 j += 1
36 if j == m:
37 results.append(i - m + 1)
38 j = table[j - 1]
39
40 return results
41
42 def rabin_karp(text, pattern, base=256, mod=10**9 + 7):
43 """Find all occurrences of pattern in text using Rabin-Karp."""
44 n, m = len(text), len(pattern)
45 if m > n:
46 return []
47
48 # Compute hash of pattern and first window
49 p_hash = 0
50 t_hash = 0
51 power = 1
52
53 for i in range(m - 1):
54 power = (power * base) % mod
55
56 for i in range(m):
57 p_hash = (p_hash * base + ord(pattern[i])) % mod
58 t_hash = (t_hash * base + ord(text[i])) % mod
59
60 results = []
61 for i in range(n - m + 1):
62 if p_hash == t_hash and text[i:i + m] == pattern:
63 results.append(i)
64 if i < n - m:
65 t_hash = (t_hash - ord(text[i]) * power) % mod
66 t_hash = (t_hash * base + ord(text[i + m])) % mod
67
68 return resultsCommon Mistakes
- Off-by-one errors when building the KMP failure table
- Using a weak hash function in Rabin-Karp causing excessive collisions
- Forgetting to handle the case where the pattern is longer than the text
- Not resetting the failure pointer correctly when a mismatch occurs