Uncertainty Estimation of Large Language Models in Medical Question   Answering

Jiaxin Wu; Yizhou Yu; Hong-Yu Zhou

arXiv:2407.08662·cs.CL·July 12, 2024·2 cites

Uncertainty Estimation of Large Language Models in Medical Question Answering

Jiaxin Wu, Yizhou Yu, Hong-Yu Zhou

PDF

Open Access

TL;DR

This paper evaluates uncertainty estimation methods for large language models in medical question answering, finds current methods lacking, and proposes a novel two-phase verification approach that improves reliability and scales with model size.

Contribution

It benchmarks existing uncertainty estimation techniques in medical QA, identifies their limitations, and introduces a new probability-free method that enhances uncertainty detection in LLMs.

Findings

01

Current UE methods perform poorly in medical QA.

02

Larger models tend to have better uncertainty estimation.

03

The proposed Two-phase Verification outperforms baseline methods.

Abstract

Large Language Models (LLMs) show promise for natural language generation in healthcare, but risk hallucinating factually incorrect information. Deploying LLMs for medical question answering necessitates reliable uncertainty estimation (UE) methods to detect hallucinations. In this work, we benchmark popular UE methods with different model sizes on medical question-answering datasets. Our results show that current approaches generally perform poorly in this domain, highlighting the challenge of UE for medical applications. We also observe that larger models tend to yield better results, suggesting a correlation between model size and the reliability of UE. To address these challenges, we propose Two-phase Verification, a probability-free Uncertainty Estimation approach. First, an LLM generates a step-by-step explanation alongside its initial answer, followed by formulating verification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Expert finding and Q&A systems

MethodsLLaMA