A Fast Heuristic for Exact String Matching
Srikrishnan Divakaran

TL;DR
This paper introduces a randomized heuristic for exact string matching that preprocesses a pattern to identify a sparse substring, enabling faster search times especially for patterns with certain character distributions.
Contribution
The paper proposes a novel randomized heuristic that preprocesses patterns to efficiently find all occurrences in the text, improving search times based on sparse substring identification.
Findings
Preprocessing time is $O(n ext{delta})$.
Expected search time is $O( m / ext{min}(|sparse(P)|, ext{Delta}) )$.
Expected sparse substring length is $ ext{Omega}( ext{Delta} imes ext{log}(rac{2 ext{Delta}}{2 ext{Delta}- ext{delta}}))$ for random patterns.
Abstract
Given a pattern string of length consisting of distinct characters and a query string of length , where the characters of and are drawn from an alphabet of size , the {\em exact string matching} problem consists of finding all occurrences of in . For this problem, we present a randomized heuristic that in time preprocesses to identify , a rarely occurring substring of , and then use it to find all occurrences of in efficiently. This heuristic has an expected search time of , where is at least . We also show that for a pattern string whose characters are chosen uniformly at random from an alphabet of size , is .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
