Lempel-Ziv Parsing in External Memory
Juha K\"arkk\"ainen, Dominik Kempa, Simon J. Puglisi

TL;DR
This paper introduces the first external memory algorithm for LZ77 parsing, enabling efficient processing of massive data sets beyond internal memory limits, which is crucial for large-scale text indexing and data compression.
Contribution
The paper presents a novel external memory algorithm for LZ77 parsing, addressing scalability issues of previous in-memory algorithms for large data sets.
Findings
Algorithm is practical and fast in real-world scenarios.
Enables processing of data larger than internal memory.
Facilitates development of large-scale text indexes.
Abstract
For decades, computing the LZ factorization (or LZ77 parsing) of a string has been a requisite and computationally intensive step in many diverse applications, including text indexing and data compression. Many algorithms for LZ77 parsing have been discovered over the years; however, despite the increasing need to apply LZ77 to massive data sets, no algorithm to date scales to inputs that exceed the size of internal memory. In this paper we describe the first algorithm for computing the LZ77 parsing in external memory. Our algorithm is fast in practice and will allow the next generation of text indexes to be realised for massive strings and string collections.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
