TL;DR
This paper establishes bounds on the approximation ratio of Lempel-Ziv parsing relative to the minimal bidirectional parse, introducing lexicographical parses and relating parse complexity to Burrows-Wheeler transform properties.
Contribution
It proves that Lempel-Ziv parse size approximates the minimal bidirectional parse within a logarithmic factor and introduces lexicographical parses as a new ordered greedy parsing method.
Findings
Z = O(b log(n/b)) bound for Lempel-Ziv approximation ratio
Existence of text families where Z = Ω(b log n)
Introduction of lexicographical parses with similar bounds
Abstract
Shannon's entropy is a clear lower bound for statistical compression. The situation is not so well understood for dictionary-based compression. A plausible lower bound is , the least number of phrases of a general bidirectional parse of a text, where phrases can be copied from anywhere else in the text. Since computing is NP-complete, a popular gold standard is , the number of phrases in the Lempel-Ziv parse of the text, which is the optimal one when phrases can be copied only from the left. While can be computed in linear time with a greedy algorithm, almost nothing has been known for decades about its approximation ratio with respect to . In this paper we prove that , where is the text length. We also show that the bound is tight as a function of , by exhibiting a text family where . Our upper bound is obtained by building…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
