Time and Space Efficient Lempel-Ziv Factorization based on Run Length Encoding
Jun'ichi Yamamoto, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda

TL;DR
This paper introduces efficient algorithms for Lempel-Ziv factorization that leverage run length encoding, significantly reducing space and time complexity for compressible strings, with both off-line and on-line variants.
Contribution
It presents novel off-line and on-line algorithms for Lempel-Ziv factorization based on RLE, achieving sublinear space complexity for highly compressible strings.
Findings
Algorithms run in O(N + n log n) time and O(n) space.
Efficient RLE conversion in O(N) time and O(1) space.
First algorithms requiring o(N) space for compressible strings.
Abstract
We propose a new approach for calculating the Lempel-Ziv factorization of a string, based on run length encoding (RLE). We present a conceptually simple off-line algorithm based on a variant of suffix arrays, as well as an on-line algorithm based on a variant of directed acyclic word graphs (DAWGs). Both algorithms run in time and O(n) extra space, where N is the size of the string, is the number of RLE factors. The time dependency on N is only in the conversion of the string to RLE, which can be computed very efficiently in O(N) time and O(1) extra space (excluding the output). When the string is compressible via RLE, i.e., , our algorithms are, to the best of our knowledge, the first algorithms which require only o(N) extra space while running in time.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Network Packet Processing and Optimization · Natural Language Processing Techniques
