TL;DR
Prompt4Trust is a reinforcement learning framework that enhances confidence calibration in multimodal large language models for healthcare, improving safety, trustworthiness, and task accuracy, especially in clinical decision-making contexts.
Contribution
It introduces the first RL-based prompt augmentation method specifically designed for confidence calibration in multimodal large language models for healthcare.
Findings
Achieved state-of-the-art medical VQA performance on PMC-VQA benchmark.
Improved confidence calibration aligning model confidence with accuracy.
Demonstrated zero-shot generalization to larger models.
Abstract
Multimodal large language models (MLLMs) hold considerable promise for applications in healthcare. However, their deployment in safety-critical settings is hindered by two key limitations: (i) sensitivity to prompt design, and (ii) a tendency to generate incorrect responses with high confidence. As clinicians may rely on a model's stated confidence to gauge the reliability of its predictions, it is especially important that when a model expresses high confidence, it is also highly accurate. We introduce Prompt4Trust, the first reinforcement learning (RL) framework for prompt augmentation targeting confidence calibration in MLLMs. A lightweight LLM is trained to produce context-aware auxiliary prompts that guide a downstream task MLLM to generate responses in which the expressed confidence more accurately reflects predictive accuracy. Unlike conventional calibration techniques,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
