Colinearity Decay: Training Quantization-Friendly ViTs with Outlier Decay
Jin Tong, Guang Liang, Peilin Sun, Jianxin Wu

TL;DR
This paper introduces Colinearity-Decay, a structural regularizer for vision Transformers that reduces activation outliers, enabling more effective low-bit quantization without architecture changes or inference overhead.
Contribution
The paper proposes a novel regularizer, Colinearity-Decay, to control harmful activation amplification in vision Transformers, improving quantization performance.
Findings
Consistently improves quantized accuracy across multiple tasks.
Preserves or enhances full-precision model performance.
Requires minimal training overhead.
Abstract
Low-bit quantization is a practical route for efficiently deploying vision Transformers, yet activation outliers complicate fully quantized deployment. Existing methods either handle quantization post-training or suppress large activations during training; however, aggressively restricting outliers in vision models can lead to a poorer trade-off between full-precision and quantized accuracy. We argue that rather than simply suppressing outliers, the training objective should control the structural amplification that makes them harmful. To this end, we introduce Colinearity-Decay (CD), a structural regularizer for ordered matrix pairs within Transformer blocks. CD penalizes detrimental cross-matrix alignment and mitigates extreme activations without altering the architecture or task loss. Applied as a decoupled update, CD is non-invasive and introduces minimal training overhead. Across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
