Fast Algorithms for Exact String Matching

Srikrishnan Divakaran

arXiv:1509.09228·cs.DS·October 1, 2015·1 cites

Fast Algorithms for Exact String Matching

Srikrishnan Divakaran

PDF

Open Access

TL;DR

This paper introduces new algorithms for exact string matching that efficiently preprocess a pattern to identify a rare substring, enabling faster search times in large texts with worst-case linear complexity.

Contribution

The paper proposes algorithms that preprocess patterns to find sparse substrings, achieving worst-case linear search time and improved expected performance based on pattern characteristics.

Findings

01

Preprocessing identifies a rarely occurring substring in the pattern.

02

Search time is linear in the length of the text in worst case.

03

Expected search time depends on the sparsity of the pattern's substring.

Abstract

Given a pattern string $P$ of length $n$ and a query string $T$ of length $m$ , where the characters of $P$ and $T$ are drawn from an alphabet of size $Δ$ , the {\em exact string matching} problem consists of finding all occurrences of $P$ in $T$ . For this problem, we present algorithms that in $O (n Δ^{2})$ time pre-process $P$ to essentially identify $s p a r se (P)$ , a rarely occurring substring of $P$ , and then use it to find occurrences of $P$ in $T$ efficiently. Our algorithms require a worst case search time of $O (m)$ , and expected search time of $O (m / min (∣ s p a r se (P) ∣, Δ))$ , where $∣ s p a r se (P) ∣$ is at least $δ$ (i.e. the number of distinct characters in $P$ ), and for most pattern strings it is observed to be $Ω (n^{1/2})$ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · DNA and Biological Computing · semigroups and automata theory