Lightweight Lempel-Ziv Parsing

Juha K\"arkk\"ainen; Dominik Kempa; Simon J. Puglisi

arXiv:1302.1064·cs.DS·December 11, 2020

Lightweight Lempel-Ziv Parsing

Juha K\"arkk\"ainen, Dominik Kempa, Simon J. Puglisi

PDF

TL;DR

This paper presents a new lightweight LZ77 factorization algorithm that is efficient in memory and time, especially suited for highly repetitive data and low-memory environments, with additional methods for computing matching statistics.

Contribution

It introduces a novel LZ77 parsing algorithm with optimized space and time complexity and provides implementations and methods for computing matching statistics.

Findings

01

The new algorithm outperforms existing methods in most cases.

02

It is particularly effective at low memory levels.

03

The paper includes new techniques for computing matching statistics.

Abstract

We introduce a new approach to LZ77 factorization that uses O(n/d) words of working space and O(dn) time for any d >= 1 (for polylogarithmic alphabet sizes). We also describe carefully engineered implementations of alternative approaches to lightweight LZ77 factorization. Extensive experiments show that the new algorithm is superior in most cases, particularly at the lowest memory levels and for highly repetitive data. As a part of the algorithm, we describe new methods for computing matching statistics which may be of independent interest.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.