The Sample Complexity of Lossless Data Compression
Terence Viaud, Ioannis Kontoyiannis

TL;DR
This paper introduces a non-asymptotic framework for understanding the fundamental limits of lossless data compression, focusing on the sample complexity needed for various source types.
Contribution
It characterizes the sample complexity of lossless compression using Re9nyi entropy and divergence, providing explicit bounds and extending to universal compression and source families.
Findings
Sample complexity for memoryless sources is characterized by Re9nyi entropy of order 1/2.
Explicit non-asymptotic bounds on sample complexity are derived with constants.
Connections to hypothesis testing and identity testing are discussed.
Abstract
A new framework is introduced for examining and evaluating the fundamental limits of lossless data compression, that emphasizes genuinely non-asymptotic results. The {\em sample complexity} of compressing a given source is defined as the smallest blocklength at which it is possible to compress that source at a specifically constrained rate and to within a specified excess-rate probability. This formulation parallels corresponding developments in statistics and computer science, and it facilitates the use of existing results on the sample complexity of various hypothesis testing problems. For arbitrary sources, the sample complexity of general variable-length compressors is shown to be tightly coupled with the sample complexity of prefix-free codes and fixed-length codes. For memoryless sources, it is shown that the sample complexity is characterized not by the source entropy, but by its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
