Norm Tweaking: High-performance Low-bit Quantization of Large Language Models
Liang Li, Qingyuan Li, Bo Zhang, Xiangxiang Chu

TL;DR
This paper introduces 'norm tweaking', a plugin technique for post-training quantization that significantly improves low-bit quantization accuracy of large language models, enabling 2-bit quantization without accuracy loss.
Contribution
The paper proposes a novel norm tweaking method that enhances existing PTQ techniques, allowing high-precision low-bit quantization of large language models with minimal performance degradation.
Findings
Achieves 2-bit quantization accuracy comparable to float models on GLM-130B and OPT-66B.
Significantly outperforms existing PTQ methods in weight-only and joint quantization.
Demonstrates practical applicability for real-world deployment of large language models.
Abstract
As the size of large language models (LLMs) continues to grow, model compression without sacrificing accuracy has become a crucial challenge for deployment. While some quantization methods, such as GPTQ, have made progress in achieving acceptable 4-bit weight-only quantization, attempts at lower-bit quantization often result in severe performance degradation. In this paper, we introduce a technique called norm tweaking, which can be used as a plugin in current PTQ methods to achieve high precision while being cost-efficient. Our approach is inspired by the observation that rectifying the quantized activation distribution to match its float counterpart can readily restore accuracy for LLMs. To achieve this, we carefully design a tweaking strategy that includes calibration data generation and channel-wise distance constraint to update the weights of normalization layers for better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
