Joint Post-Training Quantization of Vision Transformers with Learned Prompt-Guided Data Generation
Shile Li, Markus Karmann, Onay Urfalioglu

TL;DR
This paper introduces a novel joint post-training quantization method for Vision Transformers that leverages learned prompts and data-free sample generation, achieving state-of-the-art accuracy with extremely low-bit quantization for efficient edge deployment.
Contribution
It proposes a comprehensive end-to-end quantization framework that optimizes all layers simultaneously without labeled data, and introduces a data-free calibration strategy using learned prompts and diffusion models.
Findings
Achieves state-of-the-art W4A4 and W3A3 accuracy on ImageNet.
Maintains strong accuracy on ViT, DeiT, and Swin-T under W1.58A8 quantization.
Completes quantization in just one hour on a single GPU.
Abstract
We present a framework for end-to-end joint quantization of Vision Transformers trained on ImageNet for the purpose of image classification. Unlike prior post-training or block-wise reconstruction methods, we jointly optimize over the entire set of all layers and inter-block dependencies without any labeled data, scaling effectively with the number of samples and completing in just one hour on a single GPU for ViT-small. We achieve state-of-the-art W4A4 and W3A3 accuracies on ImageNet and, to the best of our knowledge, the first PTQ results that maintain strong accuracy on ViT, DeiT, and Swin-T models under extremely low-bit settings (W1.58A8), demonstrating the potential for efficient edge deployment. Furthermore, we introduce a data-free calibration strategy that synthesizes diverse, label-free samples using Stable Diffusion Turbo guided by learned multi-mode prompts. By encouraging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Image Enhancement Techniques · Generative Adversarial Networks and Image Synthesis
