MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision   Transformer

Yu-Shan Tai; An-Yeu (Andy) Wu

arXiv:2401.14895·cs.CV·February 2, 2024·1 cites

MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision Transformer

Yu-Shan Tai, An-Yeu (Andy) Wu

PDF

Open Access

TL;DR

This paper introduces MPTQ-ViT, a mixed-precision post-training quantization framework for vision transformers that improves accuracy and compressibility by addressing activation asymmetry and automating quantization parameter selection.

Contribution

It proposes novel techniques like SQ-b and OPT-m for better quantization parameter determination and a greedy layer-wise bit-width allocation method, advancing low-bit ViT quantization.

Findings

01

Achieves up to 23.35% accuracy improvement on 4-bit ViTs.

02

Significantly enhances performance of 5-bit mixed-precision ViTs.

03

Demonstrates effectiveness on ViT, DeiT, and Swin models on ImageNet.

Abstract

While vision transformers (ViTs) have shown great potential in computer vision tasks, their intense computation and memory requirements pose challenges for practical applications. Existing post-training quantization methods leverage value redistribution or specialized quantizers to address the non-normal distribution in ViTs. However, without considering the asymmetry in activations and relying on hand-crafted settings, these methods often struggle to maintain performance under low-bit quantization. To overcome these challenges, we introduce SmoothQuant with bias term (SQ-b) to alleviate the asymmetry issue and reduce the clamping loss. We also introduce optimal scaling factor ratio search (OPT-m) to determine quantization parameters by a data-dependent mechanism automatically. To further enhance the compressibility, we incorporate the above-mentioned techniques and propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Infrared Target Detection Methodologies · Advanced Image Fusion Techniques

MethodsAttention Is All You Need · Softmax · Dense Connections · Feedforward Network · Linear Layer · Dropout · Attention Dropout · Multi-Head Attention · Data-efficient Image Transformer