Mind the Ambiguity: Aleatoric Uncertainty Quantification in LLMs for Safe Medical Question Answering

Yaokun Liu; Yifan Liu; Phoebe Mbuvi; Zelin Li; Ruichen Yao; Gawon Lim; and Dong Wang

arXiv:2601.17284·cs.CL·January 27, 2026

Mind the Ambiguity: Aleatoric Uncertainty Quantification in LLMs for Safe Medical Question Answering

Yaokun Liu, Yifan Liu, Phoebe Mbuvi, Zelin Li, Ruichen Yao, Gawon Lim, and Dong Wang

PDF

Open Access

TL;DR

This paper addresses the challenge of input ambiguity in medical question answering by quantifying aleatoric uncertainty, creating a benchmark, and proposing an efficient framework that improves answer accuracy and safety without requiring model fine-tuning.

Contribution

It introduces CV-MedBench for studying ambiguity in Medical QA and develops AU-Probe, a lightweight module for detecting input ambiguity to enhance safety and accuracy.

Findings

01

AU is linearly encoded in LLM activation patterns

02

AU-Probe effectively detects input ambiguity without fine-tuning

03

QA accuracy improves by 9.48% on average with the proposed framework

Abstract

The deployment of Large Language Models in Medical Question Answering is severely hampered by ambiguous user queries, a significant safety risk that demonstrably reduces answer accuracy in high-stakes healthcare settings. In this paper, we formalize this challenge by linking input ambiguity to aleatoric uncertainty (AU), which is the irreducible uncertainty arising from underspecified input. To facilitate research in this direction, we construct CV-MedBench, the first benchmark designed for studying input ambiguity in Medical QA. Using this benchmark, we analyze AU from a representation engineering perspective, revealing that AU is linearly encoded in LLM's internal activation patterns. Leveraging this insight, we introduce a novel AU-guided "Clarify-Before-Answer" framework, which incorporates AU-Probe - a lightweight module that detects input ambiguity directly from hidden states.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning in Healthcare