Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding
Trilok Padhi, Ramneet Kaur, Adam D. Cobb, Manoj Acharya, Anirban Roy,, Colin Samplawski, Brian Matejek, Alexander M. Berenbeim, Nathaniel D., Bastian, Susmit Jha

TL;DR
This paper presents a calibration method for multi-modal large language models that leverages cross-modal grounding and temperature scaling to improve confidence accuracy across tasks like medical and visual question answering.
Contribution
It introduces a novel calibration approach combining grounding confidence with temperature scaling to enhance multi-modal LLMs' uncertainty estimates.
Findings
Significantly improved calibration on medical question answering.
Enhanced confidence accuracy in visual question answering.
Effective use of grounding to calibrate multi-modal responses.
Abstract
We introduce a novel approach for calibrating uncertainty quantification (UQ) tailored for multi-modal large language models (LLMs). Existing state-of-the-art UQ methods rely on consistency among multiple responses generated by the LLM on an input query under diverse settings. However, these approaches often report higher confidence in scenarios where the LLM is consistently incorrect. This leads to a poorly calibrated confidence with respect to accuracy. To address this, we leverage cross-modal consistency in addition to self-consistency to improve the calibration of the multi-modal models. Specifically, we ground the textual responses to the visual inputs. The confidence from the grounding model is used to calibrate the overall confidence. Given that using a grounding model adds its own uncertainty in the pipeline, we apply temperature scaling - a widely accepted parametric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems
