MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with   Encouraging Inter-Head Attention Similarity

Kanghyun Choi; Hye Yoon Lee; Dain Kwon; SunJong Park; Kyuyeun Kim,; Noseong Park; Jonghyun Choi; Jinho Lee

arXiv:2407.20021·cs.LG·April 15, 2025

MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity

Kanghyun Choi, Hye Yoon Lee, Dain Kwon, SunJong Park, Kyuyeun Kim,, Noseong Park, Jonghyun Choi, Jinho Lee

PDF

Open Access 1 Video

TL;DR

MimiQ introduces a data-free quantization method for vision transformers that improves low-bit performance by aligning inter-head attention similarities, leading to state-of-the-art results.

Contribution

The paper proposes MimiQ, a novel data-free quantization approach that enhances inter-head attention alignment to improve ViT performance without original training data.

Findings

01

Significantly outperforms baseline methods in low-bit settings.

02

Achieves new state-of-the-art in ViT data-free quantization.

03

Effectively aligns synthetic and real attention maps for better accuracy.

Abstract

Data-free quantization (DFQ) is a technique that creates a lightweight network from its full-precision counterpart without the original training data, often through a synthetic dataset. Although several DFQ methods have been proposed for vision transformer (ViT) architectures, they fail to achieve efficacy in low-bit settings. Examining the existing methods, we observe that their synthetic data produce misaligned attention maps, while those of the real samples are highly aligned. From this observation, we find that aligning attention maps of synthetic data helps improve the overall performance of quantized ViTs. Motivated by this finding, we devise MimiQ, a novel DFQ method designed for ViTs that enhances inter-head attention similarity. First, we generate synthetic data by aligning head-wise attention outputs from each spatial query patch. Then, we align the attention maps of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity· underline

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Brain Tumor Detection and Classification · Medical Image Segmentation Techniques

MethodsAttention Is All You Need · Residual Connection · Linear Layer · Layer Normalization · Softmax · Dense Connections · Multi-Head Attention · Vision Transformer · ALIGN