Mind the Ambiguity: Aleatoric Uncertainty Quantification in LLMs for Safe Medical Question Answering
Yaokun Liu, Yifan Liu, Phoebe Mbuvi, Zelin Li, Ruichen Yao, Gawon Lim, and Dong Wang

TL;DR
This paper addresses the challenge of input ambiguity in medical question answering by quantifying aleatoric uncertainty, creating a benchmark, and proposing an efficient framework that improves answer accuracy and safety without requiring model fine-tuning.
Contribution
It introduces CV-MedBench for studying ambiguity in Medical QA and develops AU-Probe, a lightweight module for detecting input ambiguity to enhance safety and accuracy.
Findings
AU is linearly encoded in LLM activation patterns
AU-Probe effectively detects input ambiguity without fine-tuning
QA accuracy improves by 9.48% on average with the proposed framework
Abstract
The deployment of Large Language Models in Medical Question Answering is severely hampered by ambiguous user queries, a significant safety risk that demonstrably reduces answer accuracy in high-stakes healthcare settings. In this paper, we formalize this challenge by linking input ambiguity to aleatoric uncertainty (AU), which is the irreducible uncertainty arising from underspecified input. To facilitate research in this direction, we construct CV-MedBench, the first benchmark designed for studying input ambiguity in Medical QA. Using this benchmark, we analyze AU from a representation engineering perspective, revealing that AU is linearly encoded in LLM's internal activation patterns. Leveraging this insight, we introduce a novel AU-guided "Clarify-Before-Answer" framework, which incorporates AU-Probe - a lightweight module that detects input ambiguity directly from hidden states.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning in Healthcare
