On empirical cumulant generating functions of code lengths for individual sequences
Neri Merhav

TL;DR
This paper investigates the empirical cumulant generating function of code lengths in lossless compression of individual sequences using finite-state machines, establishing connections to Re9nyi entropy and extending to side information scenarios.
Contribution
It introduces an individual-sequence analogue of Re9nyi entropy based on empirical CGF and compares fixed-to-variable and variable-to-variable coding approaches.
Findings
V-V length coding achieves FS compressibility.
Extension to side information alters the complexity measure.
V-V coding outperforms F-V coding in certain cases.
Abstract
We consider the problem of lossless compression of individual sequences using finite-state (FS) machines, from the perspective of the best achievable empirical cumulant generating function (CGF) of the code length, i.e., the normalized logarithm of the empirical average of the exponentiated code length. Since the probabilistic CGF is minimized in terms of the R\'enyi entropy of the source, one of the motivations of this study is to derive an individual-sequence analogue of the R\'enyi entropy, in the same way that the FS compressibility is the individual-sequence counterpart of the Shannon entropy. We consider the CGF of the code-length both from the perspective of fixed-to-variable (F-V) length coding and the perspective of variable-to-variable (V-V) length coding, where the latter turns out to yield a better result, that coincides with the FS compressibility. We also extend our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
