Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models
Jung Hwan Heo, Jeonghoon Kim, Beomseok Kwon, Byeongwook Kim, Se Jung, Kwon, Dongsoo Lee

TL;DR
This paper introduces a novel quantization scheme called AdaDim that isolates outliers by rethinking channel dimensions, significantly improving low-bit weight quantization for large language models in resource-constrained environments.
Contribution
It proposes per-IC quantization and AdaDim framework, enhancing outlier handling and weight sensitivity adaptation for more effective low-bit quantization of LLMs.
Findings
Achieves up to +4.7% on MMLU benchmark
Improves performance on HumanEval by up to +10%
Demonstrates effectiveness across various language models
Abstract
Large Language Models (LLMs) have recently demonstrated remarkable success across various tasks. However, efficiently serving LLMs has been a challenge due to the large memory bottleneck, specifically in small batch inference settings (e.g. mobile devices). Weight-only quantization can be a promising approach, but sub-4 bit quantization remains a challenge due to large-magnitude activation outliers. To mitigate the undesirable outlier effect, we first propose per-IC quantization, a simple yet effective method that creates quantization groups within each input channel (IC) rather than the conventional per-output-channel (per-OC). Our method is motivated by the observation that activation outliers affect the input dimension of the weight matrix, so similarly grouping the weights in the IC direction can isolate outliers within a group. We also find that activation outliers do not dictate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsBalanced Selection
