AffineQuant: Affine Transformation Quantization for Large Language Models
Yuexiao Ma, Huixia Li, Xiawu Zheng, Feng Ling, Xuefeng Xiao, Rui Wang,, Shilei Wen, Fei Chao, Rongrong Ji

TL;DR
AffineQuant introduces a novel affine transformation-based post-training quantization method for large language models, significantly reducing quantization errors and improving performance without additional overhead.
Contribution
The paper proposes a new PTQ approach using affine transformations with inverse matrices and a gradual mask optimization to ensure invertibility and minimize quantization errors.
Findings
Achieves lower perplexity on LLaMA2-7B without overhead.
Sets new state-of-the-art PTQ benchmark on LLaMA-30B zero-shot tasks.
Demonstrates significant performance improvements across diverse datasets.
Abstract
The significant resource requirements associated with Large-scale Language Models (LLMs) have generated considerable interest in the development of techniques aimed at compressing and accelerating neural networks. Among these techniques, Post-Training Quantization (PTQ) has emerged as a subject of considerable interest due to its noteworthy compression efficiency and cost-effectiveness in the context of training. Existing PTQ methods for LLMs limit the optimization scope to scaling transformations between pre- and post-quantization weights. In this paper, we advocate for the direct optimization using equivalent Affine transformations in PTQ (AffineQuant). This approach extends the optimization scope and thus significantly minimizing quantization errors. Additionally, by employing the corresponding inverse matrix, we can ensure equivalence between the pre- and post-quantization outputs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
