ASER: Activation Smoothing and Error Reconstruction for Large Language   Model Quantization

Weibo Zhao; Yubin Shi; Xinyu Lyu; Wanchen Sui; Shen Li; Yong Li

arXiv:2411.07762·cs.LG·December 13, 2024

ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization

Weibo Zhao, Yubin Shi, Xinyu Lyu, Wanchen Sui, Shen Li, Yong Li

PDF

Open Access 1 Video

TL;DR

This paper introduces ASER, a novel quantization algorithm for large language models that effectively reduces errors and preserves accuracy in low-bit settings through activation smoothing and error reconstruction techniques.

Contribution

ASER combines low-rank error reconstruction and activation smoothing to improve low-bit quantization of LLMs, addressing key challenges in model compression.

Findings

01

ASER achieves competitive accuracy in W4A8 quantization.

02

It effectively reduces quantization errors with minor computational overhead.

03

ASER demonstrates potential for activation quantization in LLMs.

Abstract

Quantization stands as a pivotal technique for large language model (LLM) serving, yet it poses significant challenges particularly in achieving effective low-bit quantization. The limited numerical mapping makes the quantized model produce a non-trivial error, bringing out intolerable performance degration. This paper is anchored in the basic idea of model compression objectives, and delves into the layer-wise error distribution of LLMs during post-training quantization. Subsequently, we introduce ASER, an algorithm consisting of (1) Error Reconstruction: low-rank compensation for quantization error with LoRA-style matrices constructed by whitening SVD; (2) Activation Smoothing: outlier extraction to gain smooth activation and better error compensation. ASER is capable of quantizing typical LLMs to low-bit ones, particularly preserving accuracy even in W4A8 per-channel setup.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis