Approximating LZ77 via Small-Space Multiple-Pattern Matching
Johannes Fischer, Travis Gagie, Pawe{\l} Gawrychowski, and Tomasz, Kociumaka

TL;DR
This paper introduces a space-efficient algorithm for approximate LZ77 parsing using generalized multiple-pattern matching, significantly reducing space requirements while maintaining near-linear time complexity.
Contribution
It generalizes Karp-Rabin string matching to handle multiple patterns in small space and applies this to approximate LZ77 parsing with improved space efficiency.
Findings
Achieves $ ext{O}(n ext{log} n + m)$ time with $ ext{O}(s)$ space for multiple pattern matching.
Provides an approximation of LZ77 parse with at most $(1+ ext{varepsilon})z$ phrases in $ ext{O}( ext{varepsilon}^{-1} n ext{log} n)$ time.
Reduces space from previous $ ext{Omega}(n/ ext{polylog} n)$ to $ ext{O}(z)$, where $z$ can be exponentially small.
Abstract
We generalize Karp-Rabin string matching to handle multiple patterns in time and space, where is the length of the text and is the total length of the patterns, returning correct answers with high probability. As a prime application of our algorithm, we show how to approximate the LZ77 parse of a string of length . If the optimal parse consists of phrases, using only working space we can return a parse consisting of at most phrases in time, for any . As previous quasilinear-time algorithms for LZ77 use space, but can be exponentially small in , these improvements in space are substantial.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · semigroups and automata theory · Natural Language Processing Techniques
