Sensitivity of string compressors and repetitiveness measures
Tooru Akagi, Mitsuru Funakoshi, Shunsuke Inenaga

TL;DR
This paper investigates how small changes in input strings can cause large variations in compression sizes and repetitiveness measures, providing bounds that demonstrate robustness of certain algorithms.
Contribution
It establishes that common compression algorithms and repetitiveness measures have small constant worst-case sensitivity, contrasting with previously known large sensitivity results.
Findings
Lempel-Ziv 77 compressors have constant upper bounds on sensitivity.
GCIS grammar-based compressor also exhibits small constant sensitivity.
Repetitiveness measures like string attractor size and substring complexity are similarly robust.
Abstract
The sensitivity of a string compression algorithm asks how much the output size for an input string can increase when a single character edit operation is performed on . This notion enables one to measure the robustness of compression algorithms in terms of errors and/or dynamic changes occurring in the input string. In this paper, we analyze the worst-case multiplicative sensitivity of string compression algorithms, which is defined by , where denotes the edit distance between and . For the most common versions of the Lempel-Ziv 77 compressors, we prove that the worst-case multiplicative sensitivity is upper bounded by a small constant, and give matching lower bounds. We generalize these results to the smallest bidirectional scheme . In addition, we show that the sensitivity of a grammar-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
