HOLZ: High-Order Entropy Encoding of Lempel-Ziv Factor Distances
Dominik K\"oppl, Gonzalo Navarro, Nicola Prezza

TL;DR
HOLZ introduces a novel high-order entropy encoding method for Lempel-Ziv factor offsets, leveraging co-lexicographic order to improve compression efficiency on datasets with low high-order entropy.
Contribution
It presents a new offset representation based on co-lexicographic order that approaches the k-th order empirical entropy, outperforming existing LZ parsing methods.
Findings
HOLZ offsets outperform rightmost LZ parsing.
HOLZ offsets outperform bit-optimal LZ parsing.
Effective on datasets with small high-order entropy.
Abstract
We propose a new representation of the offsets of the Lempel-Ziv (LZ) factorization based on the co-lexicographic order of the processed prefixes. The selected offsets tend to approach the k-th order empirical entropy. Our evaluations show that this choice of offsets is superior to the rightmost LZ parsing and the bit-optimal LZ parsing on datasets with small high-order entropy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Algorithms and Data Compression · Machine Learning in Bioinformatics
