Probabilistic Medical Predictions of Large Language Models

Bowen Gu; Rishi J. Desai; Kueiyu Joshua Lin; Jie Yang

arXiv:2408.11316·cs.AI·December 5, 2024

Probabilistic Medical Predictions of Large Language Models

Bowen Gu, Rishi J. Desai, Kueiyu Joshua Lin, Jie Yang

PDF

Open Access

TL;DR

This paper evaluates the reliability of probability estimates from large language models in medical predictions, finding that implicit probabilities outperform explicit ones, especially in smaller models and imbalanced datasets.

Contribution

It provides a comparative analysis of explicit versus implicit probability estimates in LLMs for clinical predictions, highlighting their limitations and areas for improvement.

Findings

01

Implicit probabilities outperform explicit probabilities in key metrics.

02

Smaller LLMs and imbalanced datasets exacerbate probability estimation issues.

03

Explicit prompts often lead to unreliable probability estimates.

Abstract

Large Language Models (LLMs) have shown promise in clinical applications through prompt engineering, allowing flexible clinical predictions. However, they struggle to produce reliable prediction probabilities, which are crucial for transparency and decision-making. While explicit prompts can lead LLMs to generate probability estimates, their numerical reasoning limitations raise concerns about reliability. We compared explicit probabilities from text generation to implicit probabilities derived from the likelihood of predicting the correct label token. Across six advanced open-source LLMs and five medical datasets, explicit probabilities consistently underperformed implicit probabilities in discrimination, precision, and recall. This discrepancy is more pronounced with smaller LLMs and imbalanced datasets, highlighting the need for cautious interpretation, improved probability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Topic Modeling