Computing LZ77 in Run-Compressed Space
Nicola Prezza, Alberto Policriti

TL;DR
This paper introduces a space-efficient algorithm for computing LZ77 factorization using run-length compressed data, significantly reducing memory for highly repetitive texts and enabling efficient self-index construction.
Contribution
It presents a novel algorithm that computes LZ77 in space proportional to the number of BWT runs, achieving minimal memory usage for highly repetitive data.
Findings
LZ77 can be computed in O(R log n) bits of space.
For highly repetitive inputs, space can be reduced to O(log n) bits.
Repetition-aware self-indexes can be built efficiently using this method.
Abstract
In this paper, we show that the LZ77 factorization of a text T {\in\Sigma^n} can be computed in O(R log n) bits of working space and O(n log R) time, R being the number of runs in the Burrows-Wheeler transform of T reversed. For extremely repetitive inputs, the working space can be as low as O(log n) bits: exponentially smaller than the text itself. As a direct consequence of our result, we show that a class of repetition-aware self-indexes based on a combination of run-length encoded BWT and LZ77 can be built in asymptotically optimal O(R + z) words of working space, z being the size of the LZ77 parsing.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · semigroups and automata theory · Natural Language Processing Techniques
