Mix-QViT: Mixed-Precision Vision Transformer Quantization Driven by   Layer Importance and Quantization Sensitivity

Navin Ranjan; Andreas Savakis

arXiv:2501.06357·cs.CV·January 14, 2025

Mix-QViT: Mixed-Precision Vision Transformer Quantization Driven by Layer Importance and Quantization Sensitivity

Navin Ranjan, Andreas Savakis

PDF

TL;DR

Mix-QViT introduces an explainability-driven mixed-precision quantization framework for Vision Transformers, optimizing layer-wise bit-widths based on importance and sensitivity, leading to improved performance at low bit-rates.

Contribution

The paper presents a novel explainability-based approach for layer-wise mixed-precision quantization of Vision Transformers, including a new clipped channel-wise method for post-training quantization.

Findings

01

Outperforms existing PTQ methods at 3-, 4-, and 6-bit precisions.

02

Achieves superior quantization-aware training results at 2-bit precision.

03

Effective across ViT, DeiT, and Swin Transformer models.

Abstract

In this paper, we propose Mix-QViT, an explainability-driven MPQ framework that systematically allocates bit-widths to each layer based on two criteria: layer importance, assessed via Layer-wise Relevance Propagation (LRP), which identifies how much each layer contributes to the final classification, and quantization sensitivity, determined by evaluating the performance impact of quantizing each layer at various precision levels while keeping others layers at a baseline. Additionally, for post-training quantization (PTQ), we introduce a clipped channel-wise quantization method designed to reduce the effects of extreme outliers in post-LayerNorm activations by removing severe inter-channel variations. We validate our approach by applying Mix-QViT to ViT, DeiT, and Swin Transformer models across multiple datasets. Our experimental results for PTQ demonstrate that both fixed-bit and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Absolute Position Encodings · Attention Dropout · Adam · Residual Connection · Feedforward Network · Dropout · Softmax · Byte Pair Encoding · Linear Layer