Oscillation-free Quantization for Low-bit Vision Transformers
Shih-Yang Liu, Zechun Liu, Kwang-Ting Cheng

TL;DR
This paper introduces techniques to eliminate weight oscillation in low-bit quantized vision transformers, significantly improving accuracy and stability during training.
Contribution
It proposes three novel methods—StatsQ, CGA, and QKR—to reduce weight oscillation and enhance quantization robustness in vision transformers.
Findings
Achieved up to 9.8% accuracy improvement on ImageNet.
Successfully mitigated weight oscillation in low-bit quantization.
Outperformed previous state-of-the-art methods by substantial margins.
Abstract
Weight oscillation is an undesirable side effect of quantization-aware training, in which quantized weights frequently jump between two quantized levels, resulting in training instability and a sub-optimal final model. We discover that the learnable scaling factor, a widely-used setting in quantization aggravates weight oscillation. In this study, we investigate the connection between the learnable scaling factor and quantized weight oscillation and use ViT as a case driver to illustrate the findings and remedies. In addition, we also found that the interdependence between quantized weights in and of a self-attention layer makes ViT vulnerable to oscillation. We, therefore, propose three techniques accordingly: statistical weight quantization () to improve quantization robustness compared to the prevalent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
