Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners
Yifei Gao, Jie Ou, Lei Wang, Jun Cheng, Mengchu Zhou

TL;DR
This paper introduces novel quantization techniques for large language models that improve performance by refining weight distributions and distributing errors more evenly, outperforming existing methods.
Contribution
The paper proposes Singular-value Diagonal Expansion and Cross-layer Learning to enhance weight quantization, addressing distribution shifts and error distribution issues in LLMs.
Findings
Significant performance gains over state-of-the-art quantization methods.
Effective weight distribution refinement improves quantization accuracy.
Error distribution across layers enhances overall model performance.
Abstract
The quantization of large language models (LLMs) has been a prominent research area aimed at enabling their lightweight deployment in practice. Existing research about LLM's quantization has mainly explored the interplay between weights and activations, or employing auxiliary components while neglecting the necessity of adjusting weights during quantization. Consequently, original weight distributions frequently fail to yield desired results after round-to-nearest (RTN) quantization. Even though incorporating techniques such as mixed precision and low-rank error approximation in LLM's quantization can yield improved results, they inevitably introduce additional computational overhead. On the other hand, traditional techniques for weight quantization, such as Generative Post-Training Quantization, rely on manually tweaking weight distributions to minimize local errors, but they fall…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Explainable Artificial Intelligence (XAI)
