Fine-Grained Post-Training Quantization for Large Vision Language Models with Quantization-Aware Integrated Gradients

Ziwei Xiang; Fanhu Zeng; Hongjian Fang; Rui-Qi Wang; Renxing Chen; Yanan Zhu; Yi Chen; Peipei Yang; Xu-Yao Zhang

arXiv:2603.17809·cs.CV·March 19, 2026

Fine-Grained Post-Training Quantization for Large Vision Language Models with Quantization-Aware Integrated Gradients

Ziwei Xiang, Fanhu Zeng, Hongjian Fang, Rui-Qi Wang, Renxing Chen, Yanan Zhu, Yi Chen, Peipei Yang, Xu-Yao Zhang

PDF

Open Access

TL;DR

This paper introduces a fine-grained post-training quantization method for large vision-language models that uses integrated gradients to evaluate token sensitivity, significantly improving accuracy with minimal latency overhead.

Contribution

It proposes a novel quantization-aware integrated gradients approach for token-level sensitivity measurement, enhancing post-training quantization precision for LVLMs.

Findings

01

Improves accuracy of LVLMs under low-bit quantization.

02

Achieves near full-precision performance with minimal latency.

03

Demonstrates effectiveness across multiple models and settings.

Abstract

Large Vision Language Models (LVLMs) have achieved remarkable success in a range of downstream tasks that require multimodal interaction, but their capabilities come with substantial computational and memory overhead, which hinders practical deployment. Among numerous acceleration techniques, post-training quantization is a popular and effective strategy for reducing memory cost and accelerating inference. However, existing LVLM quantization methods typically measure token sensitivity at the modality level, which fails to capture the complex cross-token interactions and falls short in quantitatively measuring the quantization error at the token level. As tokens interact within the model, the distinction between modalities gradually diminishes, suggesting the need for fine-grained calibration. Inspired by axiomatic attribution in mechanistic interpretability, we introduce a fine-grained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications