AffineQuant: Affine Transformation Quantization for Large Language   Models

Yuexiao Ma; Huixia Li; Xiawu Zheng; Feng Ling; Xuefeng Xiao; Rui Wang,; Shilei Wen; Fei Chao; Rongrong Ji

arXiv:2403.12544·cs.LG·March 20, 2024·1 cites

AffineQuant: Affine Transformation Quantization for Large Language Models

Yuexiao Ma, Huixia Li, Xiawu Zheng, Feng Ling, Xuefeng Xiao, Rui Wang,, Shilei Wen, Fei Chao, Rongrong Ji

PDF

Open Access 1 Repo 1 Models

TL;DR

AffineQuant introduces a novel affine transformation-based post-training quantization method for large language models, significantly reducing quantization errors and improving performance without additional overhead.

Contribution

The paper proposes a new PTQ approach using affine transformations with inverse matrices and a gradual mask optimization to ensure invertibility and minimize quantization errors.

Findings

01

Achieves lower perplexity on LLaMA2-7B without overhead.

02

Sets new state-of-the-art PTQ benchmark on LLaMA-30B zero-shot tasks.

03

Demonstrates significant performance improvements across diverse datasets.

Abstract

The significant resource requirements associated with Large-scale Language Models (LLMs) have generated considerable interest in the development of techniques aimed at compressing and accelerating neural networks. Among these techniques, Post-Training Quantization (PTQ) has emerged as a subject of considerable interest due to its noteworthy compression efficiency and cost-effectiveness in the context of training. Existing PTQ methods for LLMs limit the optimization scope to scaling transformations between pre- and post-quantization weights. In this paper, we advocate for the direct optimization using equivalent Affine transformations in PTQ (AffineQuant). This approach extends the optimization scope and thus significantly minimizing quantization errors. Additionally, by employing the corresponding inverse matrix, we can ensure equivalence between the pre- and post-quantization outputs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bytedance/affinequant
pytorchOfficial

Models

🤗
ByteDance/AffineQuant
model· ♡ 2
♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis