Analyzing and Leveraging the $k$-Sensitivity of LZ77
Gabriel Bathie, Paul Huber, Guillaume Lagarde, Akka Zemmari

TL;DR
This paper investigates how small edits to a string affect its LZ77 compression, providing tight bounds on sensitivity, a novel analysis contrasting with LZ78, and an algorithm for optimizing compression through pre-editing.
Contribution
It introduces tight bounds on the $k$-sensitivity of LZ77 compression and classifies the impact based on string compressibility, along with an approximation algorithm for pre-editing to improve compression.
Findings
Tight upper bounds for $k$-edits on LZ77 compression.
A trichotomy of compression sensitivity based on string compressibility.
An $ ext{epsilon}$-approximation algorithm for pre-editing to reduce compressed size.
Abstract
We study the sensitivity of the Lempel-Ziv 77 compression algorithm to edits, showing how modifying a string can deteriorate or improve its compression. Our first result is a tight upper bound for edits: , we have . This result contrasts with Lempel-Ziv 78, where a single edit can significantly deteriorate compressibility, a phenomenon known as a *one-bit catastrophe*. We further refine this bound, focusing on the coefficient in front of , and establish a surprising trichotomy based on the compressibility of . More precisely we prove the following bounds: - if , the compression may increase by up to a factor of , - if , this factor is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Computability, Logic, AI Algorithms · semigroups and automata theory
