Efficient Representation of Large-Alphabet Probability Distributions
Aviv Adler, Jennifer Tang, Yury Polyanskiy

TL;DR
This paper introduces a compander-based quantization method for large-alphabet probability distributions, significantly reducing representation loss compared to traditional methods and floating point representations, with theoretical optimality analysis.
Contribution
It proposes a novel compander approach using an ArcSinh function for efficient quantization of large probability distributions, achieving near-optimal theoretical performance.
Findings
Quantization loss reduced from 0.5 to 10^{-4} bits/entry for 8-bit quantization.
Method improves representation quality for real-world data like word frequencies and DNA counts.
Theoretically, the ArcSinh compander attains near-minimax optimality for KL divergence.
Abstract
A number of engineering and scientific problems require representing and manipulating probability distributions over large alphabets, which we may think of as long vectors of reals summing to . In some cases it is required to represent such a vector with only bits per entry. A natural choice is to partition the interval into uniform bins and quantize entries to each bin independently. We show that a minor modification of this procedure -- applying an entrywise non-linear function (compander) prior to quantization -- yields an extremely effective quantization method. For example, for and -sized alphabets, the quality of representation improves from a loss (under KL divergence) of bits/entry to bits/entry. Compared to floating point representations, our compander method improves the loss from …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsError Correcting Code Techniques · Advanced Data Compression Techniques · Algorithms and Data Compression
