Price of universality in vector quantization is at most 0.11 bit
Alina Harbuzova, Or Ordentlich, Yury Polyanskiy

TL;DR
This paper proves the existence of a universal vector quantization codebook that is nearly optimal for all data distributions, reducing the rate by at most 0.11 bits per dimension, which has implications for efficient low-precision model deployment.
Contribution
It establishes the theoretical existence of a universal codebook for vector quantization that performs near-optimally across all data statistics, independent of specific data distribution.
Findings
Universal codebook exists with at most 0.11 bit per dimension loss.
Universal codebook is simultaneously near-optimal for all data statistics.
Existence of a net in bR^n that nearly covers all Hilbert norms.
Abstract
Fast computation of a matrix product is a workhorse of modern LLMs. To make their deployment more efficient, a popular approach is that of using a low-precision approximation in place of true ("weight-only quantization''). Information theory demonstrates that an optimal algorithm for reducing precision of depends on the (second order) statistics of and requires a careful alignment of vector quantization codebook with PCA directions of (a process known as "waterfilling allocation''). Dependence of the codebook on statistics of , however, is highly impractical. This paper proves that there exist a universal codebook that is simultaneously near-optimal for all possible statistics of , in the sense of being at least as good as an -adapted waterfilling codebook with rate reduced by 0.11 bit per dimension. Such universal codebook would be an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Advanced Data Storage Technologies · Error Correcting Code Techniques
