Approximating LZ77 via Small-Space Multiple-Pattern Matching

Johannes Fischer; Travis Gagie; Pawe{\l} Gawrychowski; and Tomasz; Kociumaka

arXiv:1504.06647·cs.DS·September 11, 2015

Approximating LZ77 via Small-Space Multiple-Pattern Matching

Johannes Fischer, Travis Gagie, Pawe{\l} Gawrychowski, and Tomasz, Kociumaka

PDF

Open Access

TL;DR

This paper introduces a space-efficient algorithm for approximate LZ77 parsing using generalized multiple-pattern matching, significantly reducing space requirements while maintaining near-linear time complexity.

Contribution

It generalizes Karp-Rabin string matching to handle multiple patterns in small space and applies this to approximate LZ77 parsing with improved space efficiency.

Findings

01

Achieves $ ext{O}(n ext{log} n + m)$ time with $ ext{O}(s)$ space for multiple pattern matching.

02

Provides an approximation of LZ77 parse with at most $(1+ ext{varepsilon})z$ phrases in $ ext{O}( ext{varepsilon}^{-1} n ext{log} n)$ time.

03

Reduces space from previous $ ext{Omega}(n/ ext{polylog} n)$ to $ ext{O}(z)$, where $z$ can be exponentially small.

Abstract

We generalize Karp-Rabin string matching to handle multiple patterns in $O (n lo g n + m)$ time and $O (s)$ space, where $n$ is the length of the text and $m$ is the total length of the $s$ patterns, returning correct answers with high probability. As a prime application of our algorithm, we show how to approximate the LZ77 parse of a string of length $n$ . If the optimal parse consists of $z$ phrases, using only $O (z)$ working space we can return a parse consisting of at most $(1 + ε) z$ phrases in $O (ε^{- 1} n lo g n)$ time, for any $ε \in (0, 1]$ . As previous quasilinear-time algorithms for LZ77 use $Ω (n / polylog n)$ space, but $z$ can be exponentially small in $n$ , these improvements in space are substantial.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · semigroups and automata theory · Natural Language Processing Techniques