LampQ: Towards Accurate Layer-wise Mixed Precision Quantization for Vision Transformers

Minjun Kim; Jaeri Lee; Jongjin Kim; Jeongin Yun; Yongmo Kwon; U Kang

arXiv:2511.10004·cs.CV·November 17, 2025

LampQ: Towards Accurate Layer-wise Mixed Precision Quantization for Vision Transformers

Minjun Kim, Jaeri Lee, Jongjin Kim, Jeongin Yun, Yongmo Kwon, U Kang

PDF

Open Access

TL;DR

LampQ introduces a layer-wise mixed precision quantization method for Vision Transformers that uses a type-aware Fisher metric and integer linear programming to optimize bit-widths, achieving state-of-the-art accuracy with minimal performance loss.

Contribution

It proposes a novel, fine-grained, layer-wise mixed precision quantization approach for ViTs, addressing limitations of previous methods with a Fisher-based sensitivity metric and iterative bit-width optimization.

Findings

01

LampQ outperforms existing quantization methods on various ViT tasks.

02

It achieves minimal accuracy degradation with significant compression.

03

State-of-the-art results in quantizing pre-trained Vision Transformers.

Abstract

How can we accurately quantize a pre-trained Vision Transformer model? Quantization algorithms compress Vision Transformers (ViTs) into low-bit formats, reducing memory and computation demands with minimal accuracy degradation. However, existing methods rely on uniform precision, ignoring the diverse sensitivity of ViT components to quantization. Metric-based Mixed Precision Quantization (MPQ) is a promising alternative, but previous MPQ methods for ViTs suffer from three major limitations: 1) coarse granularity, 2) mismatch in metric scale across component types, and 3) quantization-unaware bit allocation. In this paper, we propose LampQ (Layer-wise Mixed Precision Quantization for Vision Transformers), an accurate metric-based MPQ method for ViTs to overcome these limitations. LampQ performs layer-wise quantization to achieve both fine-grained control and efficient acceleration,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · CCD and CMOS Imaging Sensors · Advanced Memory and Neural Computing