Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight   Quantization of Large Language Models

Jung Hwan Heo; Jeonghoon Kim; Beomseok Kwon; Byeongwook Kim; Se Jung; Kwon; Dongsoo Lee

arXiv:2309.15531·cs.LG·April 15, 2025·1 cites

Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models

Jung Hwan Heo, Jeonghoon Kim, Beomseok Kwon, Byeongwook Kim, Se Jung, Kwon, Dongsoo Lee

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel quantization scheme called AdaDim that isolates outliers by rethinking channel dimensions, significantly improving low-bit weight quantization for large language models in resource-constrained environments.

Contribution

It proposes per-IC quantization and AdaDim framework, enhancing outlier handling and weight sensitivity adaptation for more effective low-bit quantization of LLMs.

Findings

01

Achieves up to +4.7% on MMLU benchmark

02

Improves performance on HumanEval by up to +10%

03

Demonstrates effectiveness across various language models

Abstract

Large Language Models (LLMs) have recently demonstrated remarkable success across various tasks. However, efficiently serving LLMs has been a challenge due to the large memory bottleneck, specifically in small batch inference settings (e.g. mobile devices). Weight-only quantization can be a promising approach, but sub-4 bit quantization remains a challenge due to large-magnitude activation outliers. To mitigate the undesirable outlier effect, we first propose per-IC quantization, a simple yet effective method that creates quantization groups within each input channel (IC) rather than the conventional per-output-channel (per-OC). Our method is motivated by the observation that activation outliers affect the input dimension of the weight matrix, so similarly grouping the weights in the IC direction can isolate outliers within a group. We also find that activation outliers do not dictate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

johnheo/adadim-llm
pytorchOfficial

Videos

Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsBalanced Selection