Bounds for Learning Lossless Source Coding
Anders Host-Madsen

TL;DR
This paper investigates the training data requirements for learned lossless source coders to outperform universal coders, showing that only moderate data is needed, with the amount depending on sequence length and model type.
Contribution
It introduces a theoretical analysis of training data bounds for learned lossless source coding, comparing performance criteria and source models.
Findings
Training data needed is proportional to sequence length divided by log of length.
Moderate training data suffices to outperform universal coders in IID and Markov models.
Training data requirement depends on performance criteria and source model.
Abstract
This paper asks a basic question: how much training is required to beat a universal source coder? Traditionally, there have been two types of source coders: fixed, optimum coders such as Huffman coders; and universal source coders, such as Lempel-Ziv The paper considers a third type of source coders: learned coders. These are coders that are trained on data of a particular type, and then used to encode new data of that type. This is a type of coder that has recently become very popular for (lossy) image and video coding. The paper consider two criteria for performance of learned coders: the average performance over training data, and a guaranteed performance over all training except for some error probability . In both cases the coders are evaluated with respect to redundancy. The paper considers the IID binary case and binary Markov chains. In both cases it is shown that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
