Efficient Representation of Large-Alphabet Probability Distributions

Aviv Adler; Jennifer Tang; Yury Polyanskiy

arXiv:2205.03752·cs.IT·October 27, 2023·1 cites

Efficient Representation of Large-Alphabet Probability Distributions

Aviv Adler, Jennifer Tang, Yury Polyanskiy

PDF

Open Access

TL;DR

This paper introduces a compander-based quantization method for large-alphabet probability distributions, significantly reducing representation loss compared to traditional methods and floating point representations, with theoretical optimality analysis.

Contribution

It proposes a novel compander approach using an ArcSinh function for efficient quantization of large probability distributions, achieving near-optimal theoretical performance.

Findings

01

Quantization loss reduced from 0.5 to 10^{-4} bits/entry for 8-bit quantization.

02

Method improves representation quality for real-world data like word frequencies and DNA counts.

03

Theoretically, the ArcSinh compander attains near-minimax optimality for KL divergence.

Abstract

A number of engineering and scientific problems require representing and manipulating probability distributions over large alphabets, which we may think of as long vectors of reals summing to $1$ . In some cases it is required to represent such a vector with only $b$ bits per entry. A natural choice is to partition the interval $[0, 1]$ into $2^{b}$ uniform bins and quantize entries to each bin independently. We show that a minor modification of this procedure -- applying an entrywise non-linear function (compander) $f (x)$ prior to quantization -- yields an extremely effective quantization method. For example, for $b = 8 (16)$ and $1 0^{5}$ -sized alphabets, the quality of representation improves from a loss (under KL divergence) of $0.5 (0.1)$ bits/entry to $1 0^{- 4} (1 0^{- 9})$ bits/entry. Compared to floating point representations, our compander method improves the loss from $1 0^{- 1} (1 0^{- 6})$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsError Correcting Code Techniques · Advanced Data Compression Techniques · Algorithms and Data Compression