S2D: Selective Spectral Decay for Quantization-Friendly Conditioning of Neural Activations
Arnav Chavan, Nahush Lele, Udbhav Bamba, Sankalp Dayal, Aditi Raghunathan, Deepak Gupta

TL;DR
This paper introduces S2D, a spectral decay method that regularizes neural network weights to reduce activation outliers, thereby improving quantization accuracy and enabling efficient deployment of large-scale models.
Contribution
The paper presents a novel spectral decay regularization technique that targets dominant singular values to improve quantization friendliness of neural activations.
Findings
S2D reduces activation outliers and improves PTQ accuracy by up to 7%.
Models trained with S2D show better quantization robustness across tasks.
S2D enables scaling large models without sacrificing deployment efficiency.
Abstract
Activation outliers in large-scale transformer models pose a fundamental challenge to model quantization, creating excessively large ranges that cause severe accuracy drops during quantization. We empirically observe that outlier severity intensifies with pre-training scale (e.g., progressing from CLIP to the more extensively trained SigLIP and SigLIP2). Through theoretical analysis as well as empirical correlation studies, we establish the direct link between these activation outliers and dominant singular values of the weights. Building on this insight, we propose Selective Spectral Decay (), a geometrically-principled conditioning method that surgically regularizes only the weight components corresponding to the largest singular values during fine-tuning. Through extensive experiments, we demonstrate that significantly reduces activation outliers and produces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
