Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer
Yanjing Li, Sheng Xu, Baochang Zhang, Xianbin Cao, Peng Gao, Guodong, Guo

TL;DR
This paper introduces Q-ViT, a fully quantized low-bit vision transformer that maintains high accuracy and efficiency through novel modules, enabling deployment on resource-constrained devices with minimal performance loss.
Contribution
The paper proposes an information rectification module and distribution guided distillation to fully quantize ViTs, significantly reducing computation while preserving or improving accuracy.
Findings
Q-ViT achieves 80.9% Top-1 accuracy on ImageNet.
Q-ViT accelerates ViT-S by 6.14x.
Q-ViT surpasses full-precision models in accuracy.
Abstract
The large pre-trained vision transformers (ViTs) have demonstrated remarkable performance on various visual tasks, but suffer from expensive computational and memory cost problems when deployed on resource-constrained devices. Among the powerful compression approaches, quantization extremely reduces the computation and memory consumption by low-bit parameters and bit-wise operations. However, low-bit ViTs remain largely unexplored and usually suffer from a significant performance drop compared with the real-valued counterparts. In this work, through extensive empirical analysis, we first identify the bottleneck for severe performance drop comes from the information distortion of the low-bit quantized self-attention map. We then develop an information rectification module (IRM) and a distribution guided distillation (DGD) scheme for fully quantized vision transformers (Q-ViT) to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
MethodsLinear Layer · Dense Connections · Multi-Head Attention · Softmax · Attention Dropout · Feedforward Network · Dropout · Attention Is All You Need · Data-efficient Image Transformer
