Lightweight Lempel-Ziv Parsing
Juha K\"arkk\"ainen, Dominik Kempa, Simon J. Puglisi

TL;DR
This paper presents a new lightweight LZ77 factorization algorithm that is efficient in memory and time, especially suited for highly repetitive data and low-memory environments, with additional methods for computing matching statistics.
Contribution
It introduces a novel LZ77 parsing algorithm with optimized space and time complexity and provides implementations and methods for computing matching statistics.
Findings
The new algorithm outperforms existing methods in most cases.
It is particularly effective at low memory levels.
The paper includes new techniques for computing matching statistics.
Abstract
We introduce a new approach to LZ77 factorization that uses O(n/d) words of working space and O(dn) time for any d >= 1 (for polylogarithmic alphabet sizes). We also describe carefully engineered implementations of alternative approaches to lightweight LZ77 factorization. Extensive experiments show that the new algorithm is superior in most cases, particularly at the lowest memory levels and for highly repetitive data. As a part of the algorithm, we describe new methods for computing matching statistics which may be of independent interest.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
