Evaluating the Impact of Post-Training Quantization on Reliable VQA with Multimodal LLMs
Paul Jonas Kurz, Tobias Jan Wieczorek, Mohamed A. Abdelsalam, Rahaf Aljundi, Marcus Rohrbach

TL;DR
This paper systematically examines how post-training quantization affects the accuracy and reliability of multimodal large language models in visual question answering, proposing methods to mitigate reliability loss and optimize efficiency.
Contribution
It is the first systematic study linking quantization effects to reliability in multimodal models, introducing a robustness-adapted confidence estimator and demonstrating effective quantization strategies.
Findings
PTQ degrades accuracy and reliability in VQA models.
Data-aware quantization methods mitigate some reliability loss.
Combining int4 MBQ with the Selector achieves near-uncompressed performance with 75% less memory.
Abstract
Multimodal Large Language Models (MLLM) are increasingly deployed in domains where both reliability and efficiency are critical. However, current models remain overconfident, producing highly certain but incorrect answers. At the same time, their large size limits deployment on edge devices, necessitating compression. We study the intersection of these two challenges by analyzing how Post-Training Quantization (PTQ) compression affects both accuracy and reliability in Visual Question Answering (VQA). We evaluate two MLLMs, Qwen2-VL-7B and Idefics3-8B, quantized with data-free (HQQ) and data-aware (MBQ) methods across multiple bit widths. To counteract the reduction in reliability caused by quantization, we adapt the Selector confidence estimator for quantized multimodal settings and test its robustness across various quantization levels and out-of-distribution (OOD) scenarios. We find…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
