MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
Kanghyun Choi, Hye Yoon Lee, Dain Kwon, SunJong Park, Kyuyeun Kim,, Noseong Park, Jonghyun Choi, Jinho Lee

TL;DR
MimiQ introduces a data-free quantization method for vision transformers that improves low-bit performance by aligning inter-head attention similarities, leading to state-of-the-art results.
Contribution
The paper proposes MimiQ, a novel data-free quantization approach that enhances inter-head attention alignment to improve ViT performance without original training data.
Findings
Significantly outperforms baseline methods in low-bit settings.
Achieves new state-of-the-art in ViT data-free quantization.
Effectively aligns synthetic and real attention maps for better accuracy.
Abstract
Data-free quantization (DFQ) is a technique that creates a lightweight network from its full-precision counterpart without the original training data, often through a synthetic dataset. Although several DFQ methods have been proposed for vision transformer (ViT) architectures, they fail to achieve efficacy in low-bit settings. Examining the existing methods, we observe that their synthetic data produce misaligned attention maps, while those of the real samples are highly aligned. From this observation, we find that aligning attention maps of synthetic data helps improve the overall performance of quantized ViTs. Motivated by this finding, we devise MimiQ, a novel DFQ method designed for ViTs that enhances inter-head attention similarity. First, we generate synthetic data by aligning head-wise attention outputs from each spatial query patch. Then, we align the attention maps of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Brain Tumor Detection and Classification · Medical Image Segmentation Techniques
MethodsAttention Is All You Need · Residual Connection · Linear Layer · Layer Normalization · Softmax · Dense Connections · Multi-Head Attention · Vision Transformer · ALIGN
