Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer

Yanjing Li; Sheng Xu; Baochang Zhang; Xianbin Cao; Peng Gao; Guodong; Guo

arXiv:2210.06707·cs.CV·October 14, 2022·31 cites

Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer

Yanjing Li, Sheng Xu, Baochang Zhang, Xianbin Cao, Peng Gao, Guodong, Guo

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Q-ViT, a fully quantized low-bit vision transformer that maintains high accuracy and efficiency through novel modules, enabling deployment on resource-constrained devices with minimal performance loss.

Contribution

The paper proposes an information rectification module and distribution guided distillation to fully quantize ViTs, significantly reducing computation while preserving or improving accuracy.

Findings

01

Q-ViT achieves 80.9% Top-1 accuracy on ImageNet.

02

Q-ViT accelerates ViT-S by 6.14x.

03

Q-ViT surpasses full-precision models in accuracy.

Abstract

The large pre-trained vision transformers (ViTs) have demonstrated remarkable performance on various visual tasks, but suffer from expensive computational and memory cost problems when deployed on resource-constrained devices. Among the powerful compression approaches, quantization extremely reduces the computation and memory consumption by low-bit parameters and bit-wise operations. However, low-bit ViTs remain largely unexplored and usually suffer from a significant performance drop compared with the real-valued counterparts. In this work, through extensive empirical analysis, we first identify the bottleneck for severe performance drop comes from the information distortion of the low-bit quantized self-attention map. We then develop an information rectification module (IRM) and a distribution guided distillation (DGD) scheme for fully quantized vision transformers (Q-ViT) to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yanjingli0202/q-vit
pytorchOfficial

Videos

Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer· slideslive

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques

MethodsLinear Layer · Dense Connections · Multi-Head Attention · Softmax · Attention Dropout · Feedforward Network · Dropout · Attention Is All You Need · Data-efficient Image Transformer