Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners

Yifei Gao; Jie Ou; Lei Wang; Jun Cheng; Mengchu Zhou

arXiv:2407.15508·cs.CL·May 16, 2025

Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners

Yifei Gao, Jie Ou, Lei Wang, Jun Cheng, Mengchu Zhou

PDF

Open Access

TL;DR

This paper introduces novel quantization techniques for large language models that improve performance by refining weight distributions and distributing errors more evenly, outperforming existing methods.

Contribution

The paper proposes Singular-value Diagonal Expansion and Cross-layer Learning to enhance weight quantization, addressing distribution shifts and error distribution issues in LLMs.

Findings

01

Significant performance gains over state-of-the-art quantization methods.

02

Effective weight distribution refinement improves quantization accuracy.

03

Error distribution across layers enhances overall model performance.

Abstract

The quantization of large language models (LLMs) has been a prominent research area aimed at enabling their lightweight deployment in practice. Existing research about LLM's quantization has mainly explored the interplay between weights and activations, or employing auxiliary components while neglecting the necessity of adjusting weights during quantization. Consequently, original weight distributions frequently fail to yield desired results after round-to-nearest (RTN) quantization. Even though incorporating techniques such as mixed precision and low-rank error approximation in LLM's quantization can yield improved results, they inevitably introduce additional computational overhead. On the other hand, traditional techniques for weight quantization, such as Generative Post-Training Quantization, rely on manually tweaking weight distributions to minimize local errors, but they fall…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Explainable Artificial Intelligence (XAI)