Fast Algorithms for Exact String Matching
Srikrishnan Divakaran

TL;DR
This paper introduces new algorithms for exact string matching that efficiently preprocess a pattern to identify a rare substring, enabling faster search times in large texts with worst-case linear complexity.
Contribution
The paper proposes algorithms that preprocess patterns to find sparse substrings, achieving worst-case linear search time and improved expected performance based on pattern characteristics.
Findings
Preprocessing identifies a rarely occurring substring in the pattern.
Search time is linear in the length of the text in worst case.
Expected search time depends on the sparsity of the pattern's substring.
Abstract
Given a pattern string of length and a query string of length , where the characters of and are drawn from an alphabet of size , the {\em exact string matching} problem consists of finding all occurrences of in . For this problem, we present algorithms that in time pre-process to essentially identify , a rarely occurring substring of , and then use it to find occurrences of in efficiently. Our algorithms require a worst case search time of , and expected search time of , where is at least (i.e. the number of distinct characters in ), and for most pattern strings it is observed to be .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · semigroups and automata theory
