What is Boyer-Moore String Search?
it finds the first occurrence of a pattern in a text — basically a custom text.find(pattern).
the trick is it skips positions instead of checking one by one. brute force checks every position left to right. Boyer-Moore starts comparing from the right side of the pattern, and when it doesn’t match, it jumps forward — sometimes by a lot.
not the same as Boyer-Moore Voting Algorithm — same guys (Robert S. Boyer & J Strother Moore, UT Austin), completely different problem. string search came first (1977), voting came later (1981).
How it works (in my own words)
imagine you’re looking for “WORLD” in a really long sentence.
you put “WORLD” at the start and check the last letter first — the ‘D’. if the text has a ‘Z’ there, you already know none of “WORLD” can match at this position. and since ‘Z’ doesn’t even appear in “WORLD”, you can skip forward by the entire length of the pattern.
that’s the whole idea. compare from right, jump smart.
The important thing: two indices
this is the thing that tripped me up the most. you need two separate indices:
start— where the pattern is sitting on top of the textj— which character in the pattern you’re currently comparing (right to left)
the comparison: text[start + j] vs pat[j]
1text: a b c d e
2pat: b c d
3 0 1 2
4 ↑ j starts here (K-1 = 2), goes left
don’t use one variable for both. i did that and pat[3] crashed because start > 0 pushed the index past the pattern length. lesson learned the hard way.
Walkthrough
find “bcd” in “abcde”:
first, build a last map from the pattern — each char’s rightmost index:
1last = {'b': 0, 'c': 1, 'd': 2}
1Step 1: start=0, compare from right
2 text: a b c d e
3 pat: b c d
4 ↑ j=2: text[2]='c' vs pat[2]='d' → mismatch!
5
6 bad char = 'c', last['c'] = 1
7 shift = j - last['c'] = 2 - 1 = 1
8
9Step 2: start=1, compare from right
10 text: a b c d e
11 pat: b c d
12 ↑ j=2: 'd' vs 'd' ✓
13 ↑ j=1: 'c' vs 'c' ✓
14 ↑ j=0: 'b' vs 'b' ✓
15
16 all matched → return 1 ✅
find “xyz” in “abcde”:
1start=0: text[2]='c' vs pat[2]='z' → mismatch
2 'c' not in pattern → shift = j+1 = 3
3
4start=3: pattern needs 3 chars but only 2 left → return -1 ✅
Shift calculation
when mismatch at pattern index j:
1def compute_shift(j, bad_char, last):
2 idx = last.get(bad_char, -1)
3 if idx == -1:
4 return j + 1 # char not in pattern, big jump
5 shift = j - idx
6 return max(shift, 1) # at least move 1
| Case | What happens | Shift |
|---|---|---|
| bad char not in pattern | nothing to align to | j + 1 |
| bad char is in pattern, before j | align it | j - idx |
| bad char is in pattern, after j | can’t go backwards | 1 (minimum) |
Code: iterative (while loops)
1def boyer_moore(text, pat):
2 last = {}
3 for idx, c in enumerate(pat):
4 last[c] = idx
5
6 i = 0
7 while i <= len(text) - len(pat):
8 j = len(pat) - 1
9 while j >= 0 and text[i + j] == pat[j]:
10 j -= 1
11
12 if j < 0:
13 return i
14
15 bad_char = text[i + j]
16 idx = last.get(bad_char, -1)
17 if idx == -1:
18 i += j + 1
19 else:
20 i += max(j - idx, 1)
21
22 return -1
Code: recursive (no loops — NUS TCX1002 T04 Q8)
same algorithm but purely recursive, since the assignment didn’t allow loops:
1def compute_shift(j, bad_char, last):
2 idx = last.get(bad_char, -1)
3 if idx == -1:
4 return j + 1
5 shift = j - idx
6 if shift < 1:
7 shift = 1
8 return shift
9
10def find_jump(text, pat):
11 N, K = len(text), len(pat)
12
13 def build_last(i, m):
14 if i >= K: return m
15 m[pat[i]] = i
16 return build_last(i + 1, m)
17
18 last = build_last(0, {})
19
20 def find_pattern(start):
21 if start + K > N: return -1
22
23 def compare(j):
24 if j < 0: return -1
25 if text[start + j] != pat[j]: return j
26 return compare(j - 1)
27
28 diff = compare(K - 1)
29 if diff == -1:
30 return start
31
32 shift = compute_shift(diff, text[start + diff], last)
33 return find_pattern(start + shift)
34
35 return find_pattern(0)
compare(j) is nested inside find_pattern so it can see start through closure. returns -1 for “all matched” to avoid confusion with mismatch at index 0.
Mistakes i made
these are all real bugs from my implementation, not hypothetical:
| what i did | why it broke | fix |
|---|---|---|
| one variable for text + pattern index | pat[3] crash when start > 0 | keep start and j separate |
compare returns 0 for match AND mismatch at 0 | can’t tell them apart | use -1 for “all matched” |
boundary check start >= N | pattern tail overflows text | start + K > N |
text[diff] in compute_shift | diff is pattern index, not text position | text[start + diff] |
| treated shift as next position | shift is how far to move, not where to go | start + shift |
built last from text instead of pattern | last should map pattern chars only | iterate pat |
slice comparison text[start:end] == pat | works but doesn’t tell you WHERE mismatch is | need j for compute_shift |
Complexity
| Best | Worst | |
|---|---|---|
| Time | $O(N/K)$ | $O(NK)$ |
| Space | $O(K)$ | $O(K)$ |
best case: pattern’s last char never shows up in text, so you jump K positions every time. worst case: lots of partial matches like text = "aaaaaa", pat = "aaa".
Boyer vs Boyer
| String Search (this post) | Voting | |
|---|---|---|
| Year | 1977 | 1981 |
| Problem | find pattern in text | find majority element |
| Core idea | right-to-left + skip | cancel different elements |
| Time | O(N/K) best | O(N) |
| Space | O(K) | O(1) |
same duo, two classic algorithms.
References
- A Fast String Searching Algorithm (Boyer & Moore, 1977)
- Wikipedia: Boyer-Moore String Search
- NUS TCX1002 Tutorial 04 Q8