Dissecting Quantization Error: A Concentration-Alignment Perspective
Marco Federici, Boris van Breugel, Paul Whatmough, Markus Nagel

TL;DR
This paper analyzes quantization error in large models, revealing that improving the alignment of weights and activations, alongside concentration, can significantly reduce error, leading to a new transform method called CAT.
Contribution
The paper introduces a theoretical framework for understanding quantization error and proposes the block Concentration-Alignment Transform (CAT) to improve quantization accuracy.
Findings
CAT matches or outperforms prior methods at 4-bit precision
Improving alignment between weights and activations reduces quantization error
The framework explains the effectiveness of function-preserving transforms
Abstract
Quantization can drastically increase the efficiency of large language and vision models, but typically incurs an accuracy drop. Recently, function-preserving transforms (e.g. rotations, Hadamard transform, channel-wise scaling) have been successfully applied to reduce post-training quantization error, yet a principled explanation remains elusive. We analyze linear-layer quantization via the signal-to-quantization-noise ratio (SQNR), showing that for uniform integer quantization at a fixed bit width, SQNR decomposes into (i) the concentration of weights and activations (capturing spread and outliers), and (ii) the alignment of their dominant variation directions. This reveals an actionable insight: beyond concentration - the focus of most prior transforms (e.g. rotations or Hadamard) - improving alignment between weight and activation can further reduce quantization error. Motivated by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
