RLZ-r and LZ-End-r: Enhancing Move-r
Patrick Dinklage, Johannes Fischer, Lukas Nalbach, Jan Zumbrink

TL;DR
This paper enhances the r-index and Move-r data structures for string pattern matching by integrating compressed suffix arrays using Relative Lempel-Ziv and LZ-End schemes, significantly improving locate query speed and offering new space-performance trade-offs.
Contribution
It introduces the use of compressed suffix arrays with two schemes to improve locate query efficiency in r-index and Move-r, providing practical trade-offs between index size and query speed.
Findings
Locate queries are significantly faster with the new methods.
Different compression schemes offer distinct trade-offs.
Performance improvements are notable for patterns with many occurrences.
Abstract
In pattern matching on strings, a locate query asks for an enumeration of all the occurrences of a given pattern in a given text. The r-index [Gagie et al., 2018] is a recently presented compressed self index that stores the text and auxiliary information in compressed space. With some modifications, locate queries can be answered in optimal time [Nishimoto & Tabei, 2021], which has recently been proven relevant in practice in the form of Move-r [Bertram et al., 2024]. However, there remains the practical bottleneck of evaluating function for every occurrence to report. This motivates enhancing the index by a compressed representation of the suffix array featuring efficient random access, trading off space for faster answering of locate queries [Puglisi & Zhukova, 2021]. In this work, we build upon this idea considering two suitable compression schemes: Relative Lempel-Ziv…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
