Joint Post-Training Quantization of Vision Transformers with Learned Prompt-Guided Data Generation

Shile Li; Markus Karmann; Onay Urfalioglu

arXiv:2602.18861·cs.CV·February 24, 2026

Joint Post-Training Quantization of Vision Transformers with Learned Prompt-Guided Data Generation

Shile Li, Markus Karmann, Onay Urfalioglu

PDF

Open Access

TL;DR

This paper introduces a novel joint post-training quantization method for Vision Transformers that leverages learned prompts and data-free sample generation, achieving state-of-the-art accuracy with extremely low-bit quantization for efficient edge deployment.

Contribution

It proposes a comprehensive end-to-end quantization framework that optimizes all layers simultaneously without labeled data, and introduces a data-free calibration strategy using learned prompts and diffusion models.

Findings

01

Achieves state-of-the-art W4A4 and W3A3 accuracy on ImageNet.

02

Maintains strong accuracy on ViT, DeiT, and Swin-T under W1.58A8 quantization.

03

Completes quantization in just one hour on a single GPU.

Abstract

We present a framework for end-to-end joint quantization of Vision Transformers trained on ImageNet for the purpose of image classification. Unlike prior post-training or block-wise reconstruction methods, we jointly optimize over the entire set of all layers and inter-block dependencies without any labeled data, scaling effectively with the number of samples and completing in just one hour on a single GPU for ViT-small. We achieve state-of-the-art W4A4 and W3A3 accuracies on ImageNet and, to the best of our knowledge, the first PTQ results that maintain strong accuracy on ViT, DeiT, and Swin-T models under extremely low-bit settings (W1.58A8), demonstrating the potential for efficient edge deployment. Furthermore, we introduce a data-free calibration strategy that synthesizes diverse, label-free samples using Stable Diffusion Turbo guided by learned multi-mode prompts. By encouraging…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Image Enhancement Techniques · Generative Adversarial Networks and Image Synthesis