ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization
Zechun Liu, Changsheng Zhao, Hanxian Huang, Sijia Chen, Jing Zhang, Jiawei Zhao, Scott Roy, Lisa Jin, Yunyang Xiong, Yangyang Shi, Lin Xiao, Yuandong Tian, Bilge Soran, Raghuraman Krishnamoorthi, Tijmen Blankevoort, Vikas Chandra

TL;DR
ParetoQ introduces a unified framework to compare low-bit quantization methods for large language models, revealing a critical transition between 2 and 3 bits and achieving state-of-the-art results with fewer parameters.
Contribution
The paper presents ParetoQ, the first comprehensive framework for comparing various low-bit quantizations, and demonstrates improved performance and insights into the quantization learning transition.
Findings
A notable transition occurs between 2 and 3 bits in quantization.
ParetoQ surpasses previous methods in accuracy with fewer parameters.
2-bit and 3-bit quantizations maintain competitive performance.
Abstract
The optimal bit-width for achieving the best trade-off between quantized model size and accuracy has been a subject of ongoing debate. While some advocate for 4-bit quantization, others propose that 1.58-bit offers superior results. However, the lack of a cohesive framework for different bits has left such conclusions relatively tenuous. We present ParetoQ, the first unified framework that facilitates rigorous comparisons across 1-bit, 1.58-bit, 2-bit, 3-bit, and 4-bit quantization settings. Our findings reveal a notable learning transition between 2 and 3 bits: For 3-bits and above, the fine-tuned models stay close to their original pre-trained distributions, whereas for learning 2-bit networks or below, the representations change drastically. By optimizing training schemes and refining quantization functions, ParetoQ surpasses all previous methods tailored to specific bit widths.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗facebook/MobileLLM-ParetoQ-350M-4-bitmodel
- 🤗facebook/MobileLLM-ParetoQ-125M-BF16model
- 🤗facebook/MobileLLM-ParetoQ-125M-1-bitmodel· ♡ 1♡ 1
- 🤗facebook/MobileLLM-ParetoQ-125M-1.58-bitmodel
- 🤗facebook/MobileLLM-ParetoQ-125M-2-bitmodel
- 🤗facebook/MobileLLM-ParetoQ-125M-3-bitmodel
- 🤗facebook/MobileLLM-ParetoQ-125M-4-bitmodel
- 🤗facebook/MobileLLM-ParetoQ-350M-BF16model· 3 dl3 dl
- 🤗facebook/MobileLLM-ParetoQ-350M-1-bitmodel
- 🤗facebook/MobileLLM-ParetoQ-350M-1.58-bitmodel
Videos
Taxonomy
TopicsParticle Detector Development and Performance · Atomic and Subatomic Physics Research · Advanced Data Compression Techniques
