Matryoshka Quantization
Pranav Nair, Puranjay Datta, Jeff Dean, Prateek Jain, Aditya, Kusupati

TL;DR
Matryoshka Quantization introduces a multi-scale quantization method that enables training a single model to be served at various precisions, significantly improving low-bit quantization quality and flexibility.
Contribution
The paper proposes Matryoshka Quantization, a novel nested quantization technique that enhances low-precision model performance and allows flexible deployment at different precisions from one trained model.
Findings
Int2 models with MatQuant outperform standard int2 quantization by up to 7%.
Using an extra bit for outliers yields a 6% improvement at 2.05-bit precision.
MatQuant enables a single model to serve multiple precisions effectively.
Abstract
Quantizing model weights is critical for reducing the communication and inference costs of large models. However, quantizing models -- especially to low precisions like int4 or int2 -- requires a trade-off in model quality; int2, in particular, is known to severely degrade model quality. Consequently, practitioners are often forced to maintain multiple models with different quantization levels or serve a single model that best satisfies the quality-latency trade-off. On the other hand, integer data types, such as int8, inherently possess a nested (Matryoshka) structure where smaller bit-width integers, like int4 or int2, are nested within the most significant bits. Leveraging this insight, in this paper, we propose Matryoshka Quantization (MatQuant), a novel multi-scale quantization technique that alleviates the aforementioned challenge. This technique allows us to train and maintain a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptical and Acousto-Optic Technologies · Advanced Control and Stabilization in Aerospace Systems
MethodsBalanced Selection
