OutlierTune: Efficient Channel-Wise Quantization for Large Language Models
Jinguang Wang, Yuexi Yin, Haifeng Sun, Qi Qi, Jingyu Wang, Zirui, Zhuang, Tingting Yang, Jianxin Liao

TL;DR
OutlierTune is a novel, efficient channel-wise quantization method for large language models that improves accuracy, reduces computational overhead, and enhances hardware efficiency, enabling faster inference and lower memory usage.
Contribution
It introduces a new per-channel PTQ approach with dequantization pre-execution and symmetrization, addressing structured outliers in LLM activations.
Findings
Outperforms existing quantization methods across multiple tasks.
Achieves Int6 quantization comparable to FP16 for instruction-tuned LLMs.
Runs 1.48x faster than FP16 with half the memory usage.
Abstract
Quantizing the activations of large language models (LLMs) has been a significant challenge due to the presence of structured outliers. Most existing methods focus on the per-token or per-tensor quantization of activations, making it difficult to achieve both accuracy and hardware efficiency. To address this problem, we propose OutlierTune, an efficient per-channel post-training quantization (PTQ) method for the activations of LLMs. OutlierTune consists of two components: pre-execution of dequantization and symmetrization. The pre-execution of dequantization updates the model weights by the activation scaling factors, avoiding the internal scaling and costly additional computational overheads brought by the per-channel activation quantization. The symmetrization further reduces the quantization differences arising from the weight updates by ensuring the balanced numerical ranges across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques
MethodsFocus · OPT-IML
