Colinearity Decay: Training Quantization-Friendly ViTs with Outlier Decay

Jin Tong; Guang Liang; Peilin Sun; Jianxin Wu

arXiv:2605.01330·cs.CV·May 5, 2026

Colinearity Decay: Training Quantization-Friendly ViTs with Outlier Decay

Jin Tong, Guang Liang, Peilin Sun, Jianxin Wu

PDF

TL;DR

This paper introduces Colinearity-Decay, a structural regularizer for vision Transformers that reduces activation outliers, enabling more effective low-bit quantization without architecture changes or inference overhead.

Contribution

The paper proposes a novel regularizer, Colinearity-Decay, to control harmful activation amplification in vision Transformers, improving quantization performance.

Findings

01

Consistently improves quantized accuracy across multiple tasks.

02

Preserves or enhances full-precision model performance.

03

Requires minimal training overhead.

Abstract

Low-bit quantization is a practical route for efficiently deploying vision Transformers, yet activation outliers complicate fully quantized deployment. Existing methods either handle quantization post-training or suppress large activations during training; however, aggressively restricting outliers in vision models can lead to a poorer trade-off between full-precision and quantized accuracy. We argue that rather than simply suppressing outliers, the training objective should control the structural amplification that makes them harmful. To this end, we introduce Colinearity-Decay (CD), a structural regularizer for ordered matrix pairs within Transformer blocks. CD penalizes detrimental cross-matrix alignment and mitigates extreme activations without altering the architecture or task loss. Applied as a decoupled update, CD is non-invasive and introduces minimal training overhead. Across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.