Unveiling Uncertainty: A Deep Dive into Calibration and Performance of   Multimodal Large Language Models

Zijun Chen; Wenbo Hu; Guande He; Zhijie Deng; Zheng Zhang; Richang; Hong

arXiv:2412.14660·cs.CV·December 30, 2024

Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models

Zijun Chen, Wenbo Hu, Guande He, Zhijie Deng, Zheng Zhang, Richang, Hong

PDF

Open Access 1 Repo

TL;DR

This paper evaluates the calibration of multimodal large language models, identifies their miscalibration issues, and proposes techniques like temperature scaling and prompt optimization to improve their reliability in multimodal tasks.

Contribution

It introduces the IDK dataset for assessing uncertainty and provides calibration methods to enhance MLLMs' self-assessment capabilities.

Findings

01

MLLMs show miscalibration across scenarios

02

Uncertainty differs between text and images

03

Calibration improves with prompt adjustments

Abstract

Multimodal large language models (MLLMs) combine visual and textual data for tasks such as image captioning and visual question answering. Proper uncertainty calibration is crucial, yet challenging, for reliable use in areas like healthcare and autonomous driving. This paper investigates representative MLLMs, focusing on their calibration across various scenarios, including before and after visual fine-tuning, as well as before and after multimodal training of the base LLMs. We observed miscalibration in their performance, and at the same time, no significant differences in calibration across these scenarios. We also highlight how uncertainty differs between text and images and how their integration affects overall uncertainty. To better understand MLLMs' miscalibration and their ability to self-assess uncertainty, we construct the IDK (I don't know) dataset, which is key to evaluating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hfutml/calibration-mllm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsBalanced Selection