Uncertainty Estimation of Large Language Models in Medical Question Answering
Jiaxin Wu, Yizhou Yu, Hong-Yu Zhou

TL;DR
This paper evaluates uncertainty estimation methods for large language models in medical question answering, finds current methods lacking, and proposes a novel two-phase verification approach that improves reliability and scales with model size.
Contribution
It benchmarks existing uncertainty estimation techniques in medical QA, identifies their limitations, and introduces a new probability-free method that enhances uncertainty detection in LLMs.
Findings
Current UE methods perform poorly in medical QA.
Larger models tend to have better uncertainty estimation.
The proposed Two-phase Verification outperforms baseline methods.
Abstract
Large Language Models (LLMs) show promise for natural language generation in healthcare, but risk hallucinating factually incorrect information. Deploying LLMs for medical question answering necessitates reliable uncertainty estimation (UE) methods to detect hallucinations. In this work, we benchmark popular UE methods with different model sizes on medical question-answering datasets. Our results show that current approaches generally perform poorly in this domain, highlighting the challenge of UE for medical applications. We also observe that larger models tend to yield better results, suggesting a correlation between model size and the reliability of UE. To address these challenges, we propose Two-phase Verification, a probability-free Uncertainty Estimation approach. First, an LLM generates a step-by-step explanation alongside its initial answer, followed by formulating verification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems
MethodsLLaMA
