Text Indexing and Searching in Sublinear Time
J. Ian Munro, Gonzalo Navarro, Yakov Nekrich

TL;DR
This paper presents a novel text index that can be built and queried in sublinear time, significantly improving efficiency for pattern matching in large texts, with extensions to secondary memory models.
Contribution
Introduces the first sublinear-time index for text building and querying, utilizing a novel difference cover sampling technique for efficient longest common prefix computations.
Findings
Index can be built in o(n) time and queried in o(q) time
Supports pattern occurrence counting and locating efficiently
Extends to secondary memory with near-optimal I/O performance
Abstract
We introduce the first index that can be built in time for a text of length , and can also be queried in time for a pattern of length . On an alphabet of size , our index uses bits, is built in deterministic time, and computes the number of occurrences of the pattern in time . Each such occurrence can then be found in time. By slightly increasing the space and construction time, to and , respectively, for any constant , we can find the pattern occurrences in time . We build on a novel text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · semigroups and automata theory · Cellular Automata and Applications
