Linear pattern matching on sparse suffix trees

Roman Kolpakov; Gregory Kucherov; Tatiana Starikovskaya

arXiv:1103.2613·cs.DS·March 19, 2015

Linear pattern matching on sparse suffix trees

Roman Kolpakov, Gregory Kucherov, Tatiana Starikovskaya

PDF

TL;DR

This paper introduces a space-efficient index for packed strings based on sparse suffix trees, enabling faster pattern matching by exploiting character packing within computer words.

Contribution

It proposes a novel index structure for packed strings using sparse suffix trees with suffix links, achieving optimal space and improved pattern matching performance.

Findings

01

Index uses O(n/ log_sigma n) space, matching packed string size.

02

Pattern matching runs in O(m + r^2 + r * occ) time, with r characters per word.

03

Efficiently exploits character packing for faster string processing.

Abstract

Packing several characters into one computer word is a simple and natural way to compress the representation of a string and to speed up its processing. Exploiting this idea, we propose an index for a packed string, based on a {\em sparse suffix tree} \cite{KU-96} with appropriately defined suffix links. Assuming, under the standard unit-cost RAM model, that a word can store up to $lo g_{σ} n$ characters ( $σ$ the alphabet size), our index takes $O (n / lo g_{σ} n)$ space, i.e. the same space as the packed string itself. The resulting pattern matching algorithm runs in time $O (m + r^{2} + r \cdot occ)$ , where $m$ is the length of the pattern, $r$ is the actual number of characters stored in a word and $occ$ is the number of pattern occurrences.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.