MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision Transformer
Yu-Shan Tai, An-Yeu (Andy) Wu

TL;DR
This paper introduces MPTQ-ViT, a mixed-precision post-training quantization framework for vision transformers that improves accuracy and compressibility by addressing activation asymmetry and automating quantization parameter selection.
Contribution
It proposes novel techniques like SQ-b and OPT-m for better quantization parameter determination and a greedy layer-wise bit-width allocation method, advancing low-bit ViT quantization.
Findings
Achieves up to 23.35% accuracy improvement on 4-bit ViTs.
Significantly enhances performance of 5-bit mixed-precision ViTs.
Demonstrates effectiveness on ViT, DeiT, and Swin models on ImageNet.
Abstract
While vision transformers (ViTs) have shown great potential in computer vision tasks, their intense computation and memory requirements pose challenges for practical applications. Existing post-training quantization methods leverage value redistribution or specialized quantizers to address the non-normal distribution in ViTs. However, without considering the asymmetry in activations and relying on hand-crafted settings, these methods often struggle to maintain performance under low-bit quantization. To overcome these challenges, we introduce SmoothQuant with bias term (SQ-b) to alleviate the asymmetry issue and reduce the clamping loss. We also introduce optimal scaling factor ratio search (OPT-m) to determine quantization parameters by a data-dependent mechanism automatically. To further enhance the compressibility, we incorporate the above-mentioned techniques and propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Infrared Target Detection Methodologies · Advanced Image Fusion Techniques
MethodsAttention Is All You Need · Softmax · Dense Connections · Feedforward Network · Linear Layer · Dropout · Attention Dropout · Multi-Head Attention · Data-efficient Image Transformer
