Dissecting Quantization Error: A Concentration-Alignment Perspective

Marco Federici; Boris van Breugel; Paul Whatmough; Markus Nagel

arXiv:2603.04359·cs.LG·March 5, 2026

Dissecting Quantization Error: A Concentration-Alignment Perspective

Marco Federici, Boris van Breugel, Paul Whatmough, Markus Nagel

PDF

Open Access

TL;DR

This paper analyzes quantization error in large models, revealing that improving the alignment of weights and activations, alongside concentration, can significantly reduce error, leading to a new transform method called CAT.

Contribution

The paper introduces a theoretical framework for understanding quantization error and proposes the block Concentration-Alignment Transform (CAT) to improve quantization accuracy.

Findings

01

CAT matches or outperforms prior methods at 4-bit precision

02

Improving alignment between weights and activations reduces quantization error

03

The framework explains the effectiveness of function-preserving transforms

Abstract

Quantization can drastically increase the efficiency of large language and vision models, but typically incurs an accuracy drop. Recently, function-preserving transforms (e.g. rotations, Hadamard transform, channel-wise scaling) have been successfully applied to reduce post-training quantization error, yet a principled explanation remains elusive. We analyze linear-layer quantization via the signal-to-quantization-noise ratio (SQNR), showing that for uniform integer quantization at a fixed bit width, SQNR decomposes into (i) the concentration of weights and activations (capturing spread and outliers), and (ii) the alignment of their dominant variation directions. This reveals an actionable insight: beyond concentration - the focus of most prior transforms (e.g. rotations or Hadamard) - improving alignment between weight and activation can further reduce quantization error. Motivated by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications