TL;DR
This paper introduces r-comp, an optimal-time algorithm for constructing run-length BWT in space proportional to the number of runs, significantly improving efficiency for highly repetitive strings.
Contribution
The paper presents the first optimal-time RLBWT construction algorithm with space bounded by the number of BWT runs, addressing previous limitations.
Findings
Achieves $O(n + r \,\log r)$ time complexity
Uses $O(r \,\log n)$ bits of working space
Effective on real-world highly repetitive datasets
Abstract
The compression of highly repetitive strings (i.e., strings with many repetitions) has been a central research topic in string processing, and quite a few compression methods for these strings have been proposed thus far. Among them, an efficient compression format gathering increasing attention is the run-length Burrows--Wheeler transform (RLBWT), which is a run-length encoded BWT as a reversible permutation of an input string on the lexicographical order of suffixes. State-of-the-art construction algorithms of RLBWT have a serious issue with respect to (i) non-optimal computation time or (ii) a working space that is linearly proportional to the length of an input string. In this paper, we present \emph{r-comp}, the first optimal-time construction algorithm of RLBWT in BWT-runs bounded space. That is, the computational complexity of r-comp is time and …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
