Position Paper On Diagnostic Uncertainty Estimation from Large Language   Models: Next-Word Probability Is Not Pre-test Probability

Yanjun Gao; Skatje Myers; Shan Chen; Dmitriy Dligach; Timothy A; Miller; Danielle Bitterman; Guanhua Chen; Anoop Mayampurath; Matthew Churpek,; Majid Afshar

arXiv:2411.04962·cs.AI·November 8, 2024

Position Paper On Diagnostic Uncertainty Estimation from Large Language Models: Next-Word Probability Is Not Pre-test Probability

Yanjun Gao, Skatje Myers, Shan Chen, Dmitriy Dligach, Timothy A, Miller, Danielle Bitterman, Guanhua Chen, Anoop Mayampurath, Matthew Churpek,, Majid Afshar

PDF

Open Access

TL;DR

This paper critically examines the limitations of current large language models in estimating diagnostic pre-test probabilities, emphasizing the need for improved confidence estimation techniques in clinical decision support.

Contribution

It evaluates two LLMs on diagnosis tasks, analyzes existing probability estimation methods, and highlights their limitations for clinical use.

Findings

01

Current LLM probability methods have significant limitations.

02

Structured EHR data can be used for diagnosis tasks.

03

Improved confidence estimation techniques are necessary.

Abstract

Large language models (LLMs) are being explored for diagnostic decision support, yet their ability to estimate pre-test probabilities, vital for clinical decision-making, remains limited. This study evaluates two LLMs, Mistral-7B and Llama3-70B, using structured electronic health record data on three diagnosis tasks. We examined three current methods of extracting LLM probability estimations and revealed their limitations. We aim to highlight the need for improved techniques in LLM confidence estimation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling