Text Indexing and Searching in Sublinear Time

J. Ian Munro; Gonzalo Navarro; Yakov Nekrich

arXiv:1712.07431·cs.DS·July 16, 2019·1 cites

Text Indexing and Searching in Sublinear Time

J. Ian Munro, Gonzalo Navarro, Yakov Nekrich

PDF

Open Access

TL;DR

This paper presents a novel text index that can be built and queried in sublinear time, significantly improving efficiency for pattern matching in large texts, with extensions to secondary memory models.

Contribution

Introduces the first sublinear-time index for text building and querying, utilizing a novel difference cover sampling technique for efficient longest common prefix computations.

Findings

01

Index can be built in o(n) time and queried in o(q) time

02

Supports pattern occurrence counting and locating efficiently

03

Extends to secondary memory with near-optimal I/O performance

Abstract

We introduce the first index that can be built in $o (n)$ time for a text of length $n$ , and can also be queried in $o (q)$ time for a pattern of length $q$ . On an alphabet of size $σ$ , our index uses $O (n lo g n lo g σ)$ bits, is built in $O (n ((lo g lo g n)^{2} + lo g σ) / lo g_{σ} n)$ deterministic time, and computes the number $occ$ of occurrences of the pattern in time $O (q / lo g_{σ} n + lo g n)$ . Each such occurrence can then be found in $O (lo g n lo g σ)$ time. By slightly increasing the space and construction time, to $O (n (lo g n lo g σ + lo g σ lo g^{ε} n))$ and $O (n lo g^{3/2} σ / lo g^{1/2 - ε} n)$ , respectively, for any constant $0 < ε < 1/2$ , we can find the $occ$ pattern occurrences in time $O (q / lo g_{σ} n + lo g_{σ} n lo g lo g n + occ)$ . We build on a novel text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · semigroups and automata theory · Cellular Automata and Applications