Activation Sensitivity as a Unifying Principle for Post-Training Quantization
Bruce Changlong Xu

TL;DR
This paper introduces a unified theoretical framework for post-training quantization of large language models based on activation sensitivity, which explains and connects existing heuristics like activation-aware and second-order methods.
Contribution
It formalizes activation sensitivity as a key measure for channel importance, unifying different PTQ approaches under a common theoretical foundation.
Findings
Sensitivity is the squared norm of gradient-weighted activations.
AWQ and GPTQ are approximations of activation sensitivity.
Connects gradient saliency, Fisher information, and Hessian criteria.
Abstract
Post-training quantization (PTQ) methods for large language models rely on heuristics that implicitly estimate which weight channels most strongly influence model behavior. Two dominant paradigms have emerged: activation-aware methods such as AWQ prioritize channels with large activation magnitudes, while second-order methods such as GPTQ allocate quantization error according to input covariance structure. Despite strong empirical performance, these approaches remain conceptually fragmented, and it is unclear what underlying quantity they are approximating. In this work, we present a unified theoretical framework for PTQ by formalizing activation sensitivity, defined as the expected impact of channel-wise perturbations on the loss. Using a first-order Taylor expansion, we show that sensitivity naturally arises as the squared norm of gradient-weighted activations, yielding a principled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeurobiology of Language and Bilingualism · Neural dynamics and brain function · Domain Adaptation and Few-Shot Learning
