Sublinear Algorithms for Approximating String Compressibility
Sofya Raskhodnikova, Dana Ron, Ronitt Rubinfeld, Adam Smith

TL;DR
This paper develops sublinear algorithms to estimate string compressibility under RLE and LZ schemes, providing bounds and structural insights that connect compression to string properties and distribution support size.
Contribution
It introduces the first sublinear algorithms for approximating RLE and LZ compressibility, along with structural lemmas linking LZ compressibility to substring diversity.
Findings
Algorithms achieve near-optimal approximation in sublinear time
Structural lemmas relate LZ compressibility to short substring counts
Approximation of LZ compressibility relates to distribution support size
Abstract
We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-length encoding (RLE) and Lempel-Ziv (LZ), and present sublinear algorithms for approximating compressibility with respect to both schemes. We also give several lower bounds that show that our algorithms for both schemes cannot be improved significantly. Our investigation of LZ yields results whose interest goes beyond the initial questions we set out to study. In particular, we prove combinatorial structural lemmas that relate the compressibility of a string with respect to Lempel-Ziv to the number of distinct short substrings contained in it. In addition, we show that approximating the compressibility with respect to LZ is related to approximating the support size of a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · semigroups and automata theory · Machine Learning and Algorithms
