String Matching (KMP)

The Intuition

Imagine you're reading a long novel, scanning for the phrase "abracadabra." You start comparing from page one, letter by letter. At some point you've matched "abracadab" (nine letters!) and then the next letter is wrong. Brute force says go back to where you started and try the next position. But think about what you've already seen. Those nine matched letters "abracadab" end with "ab," which is also how "abracadabra" begins. You don't need to start over from scratch. You can jump ahead and continue matching from the third character of your search phrase.

That's KMP's entire insight. When a mismatch happens after a partial match, some suffix of what you've already matched might be a prefix of the pattern. You can skip ahead to where that partial match could continue instead of rewinding the text. You never backtrack through the text. Every character in the text gets looked at once.

The failure function (also called the partial match table) precomputes these fallback positions. For each position i in your pattern, it answers: "if I fail at position i, what's the longest proper prefix of the pattern that matches a suffix of what I've matched so far?" Building this table is itself a pattern-matching problem, matching the pattern against itself. It takes O(m) time, where m is the pattern length, and then the actual search runs in O(n) time.

Picture the failure table as a set of bookmarks in your search phrase. When you stumble, instead of going back to page one, you flip to the right bookmark and keep reading forward. Every position in the pattern has at most one bookmark, and following bookmarks always moves you backward in the pattern but never backward in the text. That combination is why KMP guarantees linear time no matter what the input looks like, even adversarial cases like searching for "aaaaab" in "aaaaaaaaaa."

When to use

Substring search, pattern detection in strings, repeated pattern identification, plagiarism detection, or any problem requiring efficient text search without backtracking.

When NOT to use

Pattern has complex regex features (use regex engine), need approximate/fuzzy matching (use edit distance), or searching for multiple patterns simultaneously (use Aho-Corasick).

Pattern Recognition

You need to find one string inside another without brute-force O(n * m) costThe problem asks for repeated substrings or periodic patterns within a stringYou need to match a pattern across multiple texts (Rabin-Karp with multi-pattern)The problem involves checking if a string can be formed by repeating a substringPrefix and suffix relationships within a string are central to the solution

Variations:

KMP (Knuth-Morris-Pratt): Precomputes a failure table for guaranteed O(n + m) worst-case performance.
Rabin-Karp: Uses rolling hash for average-case O(n + m); supports multi-pattern matching efficiently.
Z-Algorithm: Computes the Z-array for each position, useful for pattern matching and string analysis.

Edge Cases

Empty pattern or empty text should be handled gracefully
Pattern longer than the text returns no matches
Pattern and text are identical (single full match)
All characters in text and pattern are the same (e.g., "aaaa" in "aaaaaa")
Single-character pattern or single-character text
Pattern appears at the very beginning or very end of the text
Overlapping matches (e.g., "aa" in "aaaa" yields indices 0, 1, 2)

Practice Problems

EasyFind the Index of First Occurrence

EasyRepeated Substring Pattern

HardShortest Palindrome

Key Points

•KMP builds a failure function (partial match table) to skip redundant comparisons
•Rabin-Karp uses rolling hash for average-case O(n + m) multi-pattern matching
•KMP guarantees worst-case O(n + m) with no backtracking on the text
•The failure function represents the longest proper prefix that is also a suffix
•Rabin-Karp can degrade to O(n * m) with many hash collisions

Code Template

 1  def build_kmp_table(pattern):
 2      """Build the KMP partial match (failure) table."""
 3      m = len(pattern)
 4      table = [0] * m
 5      length = 0
 6      i = 1
 7  
 8      while i < m:
 9          if pattern[i] == pattern[length]:
10              length += 1
11              table[i] = length
12              i += 1
13          elif length > 0:
14              length = table[length - 1]
15          else:
16              table[i] = 0
17              i += 1
18  
19      return table
20  
21  def kmp_search(text, pattern):
22      """Find all occurrences of pattern in text using KMP."""
23      n, m = len(text), len(pattern)
24      if m == 0:
25          return [0]
26  
27      table = build_kmp_table(pattern)
28      results = []
29      j = 0  # pointer in pattern
30  
31      for i in range(n):
32          while j > 0 and text[i] != pattern[j]:
33              j = table[j - 1]
34          if text[i] == pattern[j]:
35              j += 1
36          if j == m:
37              results.append(i - m + 1)
38              j = table[j - 1]
39  
40      return results
41  
42  def rabin_karp(text, pattern, base=256, mod=10**9 + 7):
43      """Find all occurrences of pattern in text using Rabin-Karp."""
44      n, m = len(text), len(pattern)
45      if m > n:
46          return []
47  
48      # Compute hash of pattern and first window
49      p_hash = 0
50      t_hash = 0
51      power = 1
52  
53      for i in range(m - 1):
54          power = (power * base) % mod
55  
56      for i in range(m):
57          p_hash = (p_hash * base + ord(pattern[i])) % mod
58          t_hash = (t_hash * base + ord(text[i])) % mod
59  
60      results = []
61      for i in range(n - m + 1):
62          if p_hash == t_hash and text[i:i + m] == pattern:
63              results.append(i)
64          if i < n - m:
65              t_hash = (t_hash - ord(text[i]) * power) % mod
66              t_hash = (t_hash * base + ord(text[i + m])) % mod
67  
68      return results

Common Mistakes

Off-by-one errors when building the KMP failure table
Using a weak hash function in Rabin-Karp causing excessive collisions
Forgetting to handle the case where the pattern is longer than the text
Not resetting the failure pointer correctly when a mismatch occurs

Related Patterns

CrackingWalnuts

Tries & String AlgorithmsTemplate 2 of 3

Trie & StringsAdvancedO(n + m) time, O(m) space