LampQ: Towards Accurate Layer-wise Mixed Precision Quantization for Vision Transformers
Minjun Kim, Jaeri Lee, Jongjin Kim, Jeongin Yun, Yongmo Kwon, U Kang

TL;DR
LampQ introduces a layer-wise mixed precision quantization method for Vision Transformers that uses a type-aware Fisher metric and integer linear programming to optimize bit-widths, achieving state-of-the-art accuracy with minimal performance loss.
Contribution
It proposes a novel, fine-grained, layer-wise mixed precision quantization approach for ViTs, addressing limitations of previous methods with a Fisher-based sensitivity metric and iterative bit-width optimization.
Findings
LampQ outperforms existing quantization methods on various ViT tasks.
It achieves minimal accuracy degradation with significant compression.
State-of-the-art results in quantizing pre-trained Vision Transformers.
Abstract
How can we accurately quantize a pre-trained Vision Transformer model? Quantization algorithms compress Vision Transformers (ViTs) into low-bit formats, reducing memory and computation demands with minimal accuracy degradation. However, existing methods rely on uniform precision, ignoring the diverse sensitivity of ViT components to quantization. Metric-based Mixed Precision Quantization (MPQ) is a promising alternative, but previous MPQ methods for ViTs suffer from three major limitations: 1) coarse granularity, 2) mismatch in metric scale across component types, and 3) quantization-unaware bit allocation. In this paper, we propose LampQ (Layer-wise Mixed Precision Quantization for Vision Transformers), an accurate metric-based MPQ method for ViTs to overcome these limitations. LampQ performs layer-wise quantization to achieve both fine-grained control and efficient acceleration,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · CCD and CMOS Imaging Sensors · Advanced Memory and Neural Computing
