SmoothRot: Combining Channel-Wise Scaling and Rotation for Quantization-Friendly LLMs
Patrik Czak\'o, G\'abor Kert\'esz, S\'andor Sz\'en\'asi

TL;DR
SmoothRot is a post-training quantization method that effectively reduces activation outliers in large language models, enabling more accurate 4-bit quantization without extra latency.
Contribution
It introduces a novel combination of channel-wise scaling and Hadamard transformations to improve quantization accuracy in LLMs.
Findings
Reduces performance gap by 10-30% on LLaMA2, LLaMA3.1, and Mistral models.
Effectively handles activation outliers for 4-bit quantization.
No additional inference latency introduced.
Abstract
We present SmoothRot, a novel post-training quantization technique to enhance the efficiency of 4-bit quantization in Large Language Models (LLMs). SmoothRot addresses the critical challenge of massive activation outliers, by integrating channel-wise scaling with Hadamard transformations. Our technique effectively transforms extreme outliers into quantization-friendly activations, significantly improving quantization accuracy. Experiments conducted on popular LLMs (LLaMA2 7B, LLaMA3.1 8B, and Mistral 7B) demonstrate that SmoothRot consistently reduces the performance gap between quantized and FP16 models by approximately 10-30\% across language generation and zero-shot reasoning tasks, without introducing additional inference latency. Code is available at https://github.com/czakop/smoothrot.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
