Computing LZ77 in Run-Compressed Space

Nicola Prezza; Alberto Policriti

arXiv:1510.06257·cs.DS·October 22, 2015·2 cites

Computing LZ77 in Run-Compressed Space

Nicola Prezza, Alberto Policriti

PDF

Open Access

TL;DR

This paper introduces a space-efficient algorithm for computing LZ77 factorization using run-length compressed data, significantly reducing memory for highly repetitive texts and enabling efficient self-index construction.

Contribution

It presents a novel algorithm that computes LZ77 in space proportional to the number of BWT runs, achieving minimal memory usage for highly repetitive data.

Findings

01

LZ77 can be computed in O(R log n) bits of space.

02

For highly repetitive inputs, space can be reduced to O(log n) bits.

03

Repetition-aware self-indexes can be built efficiently using this method.

Abstract

In this paper, we show that the LZ77 factorization of a text T {\in\Sigma^n} can be computed in O(R log n) bits of working space and O(n log R) time, R being the number of runs in the Burrows-Wheeler transform of T reversed. For extremely repetitive inputs, the working space can be as low as O(log n) bits: exponentially smaller than the text itself. As a direct consequence of our result, we show that a class of repetition-aware self-indexes based on a combination of run-length encoded BWT and LZ77 can be built in asymptotically optimal O(R + z) words of working space, z being the size of the LZ77 parsing.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · semigroups and automata theory · Natural Language Processing Techniques