Turning LLM Activations Quantization-Friendly

Patrik Czak\'o; G\'abor Kert\'esz; S\'andor Sz\'en\'asi

arXiv:2506.01967·cs.LG·June 4, 2025

Turning LLM Activations Quantization-Friendly

Patrik Czak\'o, G\'abor Kert\'esz, S\'andor Sz\'en\'asi

PDF

Open Access

TL;DR

This paper explores how to improve quantization of LLM activations by analyzing outliers, introducing a new metric for quantization difficulty, and proposing a hybrid scaling and rotation method to reduce quantization error.

Contribution

It introduces a novel metric for quantization difficulty and a hybrid scaling-rotation approach to enhance activation quantization in LLMs.

Findings

01

Channel-wise scaling reduces quantization error.

02

Rotation transforms outlier distributions.

03

Hybrid approach improves quantization accuracy.

Abstract

Quantization effectively reduces the serving costs of Large Language Models (LLMs) by speeding up data movement through compressed parameters and enabling faster operations via integer arithmetic. However, activating integer arithmetic requires quantizing both weights and activations, which poses challenges due to the significant outliers in LLMs that increase quantization error. In this work, we investigate these outliers with an emphasis on their effect on layer-wise quantization error, then examine how smoothing and rotation transform the observed values. Our primary contributions include introducing a new metric to measure and visualize quantization difficulty based on channel magnitudes, as well as proposing a hybrid approach that applies channel-wise scaling before rotation, supported by a mathematical formulation of its benefits.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMagnetic confinement fusion research · Advancements in Photolithography Techniques · Mathematics, Computing, and Information Processing