Lempel-Ziv (LZ77) Factorization in Sublinear Time
Dominik Kempa, Tomasz Kociumaka

TL;DR
This paper introduces the first sublinear time algorithm for LZ77 factorization, significantly improving over the longstanding linear-time algorithms and providing efficient indexing for string pattern occurrences.
Contribution
It presents the first o(n)-time algorithm for LZ77 factorization, breaking the 50-year linear-time barrier, and develops an efficient indexing method for pattern occurrence queries.
Findings
Algorithm runs in O(n/√log n) time for binary strings.
Uses optimal O(n/ log n) working space.
Generalizes to larger alphabets with similar efficiency.
Abstract
Lempel-Ziv (LZ77) factorization is a fundamental problem in string processing: Greedily partition a given string from left to right into blocks (called phrases) so that each phrase is either the leftmost occurrence of a letter or the longest prefix of the unprocessed suffix that has another occurrence earlier in . Due to numerous applications, LZ77 factorization is one of the most studied problems on strings. In the 47 years since its inception, several algorithms were developed for different models of computation, including parallel, GPU, external-memory, and quantum. Remarkably, however, the complexity of the most basic variant is still not settled: All existing algorithms in the RAM model run in time, which is a factor away from the lower bound of (following from the necessity to read the input, which takes …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCellular Automata and Applications
