Comparison of LZ77-type Parsings

Dmitry Kosolobov; Arseny M. Shur

arXiv:1708.03558·cs.IT·May 24, 2018

Comparison of LZ77-type Parsings

Dmitry Kosolobov, Arseny M. Shur

PDF

TL;DR

This paper compares various LZ77 parsing variants, establishing tight bounds on the number of phrases depending on encoding methods and overlap allowances, advancing theoretical understanding of compression schemes.

Contribution

It provides tight bounds and relationships between different LZ77 parsing variants, clarifying their theoretical differences and similarities.

Findings

01

Bounds on phrase counts for different parsing variants.

02

Relationships between overlap-allowing and non-overlapping parsings.

03

Examples demonstrating tightness of bounds.

Abstract

We investigate the relations between different variants of the LZ77 parsing existing in the literature. All of them are defined as greedily constructed parsings encoding each phrase by reference to a string occurring earlier in the input. They differ by the phrase encodings: encoded by pairs (length + position of an earlier occurrence) or by triples (length + position of an earlier occurrence + the letter following the earlier occurring part); and they differ by allowing or not allowing overlaps between the phrase and its earlier occurrence. For a given string of length $n$ over an alphabet of size $σ$ , denote the numbers of phrases in the parsings allowing (resp., not allowing) overlaps by $z$ (resp., $\overset{z}{^}$ ) for "pairs", and by $z_{3}$ (resp., $\overset{z}{^}_{3}$ ) for "triples". We prove the following bounds and provide series of examples showing that these bounds are tight: $∙$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.