Every Bit Counts: A Theoretical Study of Precision-Expressivity Tradeoffs in Quantized Transformers
Sayak Chakrabarti, Toniann Pitassi, Josh Alman

TL;DR
This paper provides a theoretical analysis of how quantization affects the expressivity of Transformer models, revealing a precise tradeoff where losing even one bit of precision can impair the model's ability to perform equality-based tasks.
Contribution
It introduces a formal framework demonstrating the exact precision threshold needed for Transformers to compute equality functions, linking quantization to expressivity loss.
Findings
A one-layer softmax Transformer can compute certain functions with p bits but not with p-1 bits.
Quantization impacts the ability to perform equality-like comparisons.
A tight one-bit threshold for the expressivity of quantized Transformers is established.
Abstract
Quantization reduces the numerical precision of Transformer computations and is widely used to accelerate inference, yet its effect on expressivity remains poorly characterized. We demonstrate a fine-grained theoretical tradeoff between expressivity and precision: For every p we exhibit a function {\Gamma}, inspired by the equality function, and prove that a one-layer softmax Transformer can compute {\Gamma}, with p bits of precision, but not with p-1 bits of precision. This result concretely explains the widely observed phenomenon of empirical loss of expressivity when quantization is used. Practically, it suggests that tasks requiring equality-like comparisons (exact match, membership, etc.) are especially sensitive to quantization. Dropping even one bit can cross a threshold where the model cannot represent the needed comparison reliably. Thus, it paves the way for developing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Embedded Systems Design Techniques
