A Fast and Small Subsampled R-index
Dustin Cobas, Travis Gagie, Gonzalo Navarro

TL;DR
The paper introduces the sr-index, a space-efficient variant of the r-index for repetitive texts that maintains fast pattern matching while reducing space usage, outperforming most existing compressed indexes.
Contribution
It proposes the sr-index, a novel subsampled r-index that reduces space complexity with a controlled increase in query time, supported by theoretical guarantees and empirical validation.
Findings
The sr-index uses 1.5-3.0 times less space than the r-index.
It outperforms most compressed indexes in time and space on repetitive texts.
Lempel-Ziv indexes achieve better compression but are significantly slower.
Abstract
The -index (Gagie et al., JACM 2020) represented a breakthrough in compressed indexing of repetitive text collections, outperforming its alternatives by orders of magnitude. Its space usage, where is the number of runs in the Burrows-Wheeler Transform of the text, is however larger than Lempel-Ziv and grammar-based indexes, and makes it uninteresting in various real-life scenarios of milder repetitiveness. In this paper we introduce the -index, a variant that limits the space to for a text of length and a given parameter , at the expense of multiplying by the time per occurrence reported. The -index is obtained by carefully subsampling the text positions indexed by the -index, in a way that we prove is still able to support pattern matching with guaranteed performance. Our experiments demonstrate that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
