Efficient LZ78 factorization of grammar compressed text
Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda

TL;DR
This paper introduces an efficient algorithm for computing the LZ78 factorization directly from a grammar-compressed text represented as an SLP, significantly improving performance on compressible data.
Contribution
The authors develop a novel algorithm that computes LZ78 factorization from an SLP in sublinear time relative to the uncompressed size, with improvements based on the text's redundancy.
Findings
Algorithm runs in $O(n ext{L} + m ext{log}N)$ time for certain conditions.
Performance improves with higher redundancy and smaller SLP size.
Approaches linear time for highly compressible texts.
Abstract
We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size representing a text of length , our algorithm computes the LZ78 factorization of in time and space, where is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the term in the time and space complexities becomes either , where is the length of the longest LZ78 factor, or where is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of of a certain length. Since where is the alphabet size, the latter is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
