Fast Compressed Self-Indexes with Deterministic Linear-Time Construction
J. Ian Munro, Gonzalo Navarro, Yakov Nekrich

TL;DR
This paper presents a new compressed suffix array that can be built in linear deterministic time, offering efficient pattern counting with improved performance over previous indexes, suitable for large texts and alphabets.
Contribution
Introduces a compressed suffix array with deterministic linear-time construction and improved query times, outperforming existing compressed indexes in efficiency and construction time.
Findings
Constructs in $O(n)$ deterministic time
Supports pattern counting in $O(|P| + ext{loglog}_w \sigma)$ time
Outperforms previous compressed indexes in speed and space
Abstract
We introduce a compressed suffix array representation that, on a text of length over an alphabet of size , can be built in deterministic time, within bits of working space, and counts the number of occurrences of any pattern in in time on a RAM machine of -bit words. This new index outperforms all the other compressed indexes that can be built in linear deterministic time, and some others. The only faster indexes can be built in linear time only in expectation, or require bits. We also show that, by using bits, we can build in linear time an index that counts in time , which is RAM-optimal for and sufficiently long patterns.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
