Knowing When to Abstain: Medical LLMs Under Clinical Uncertainty

Sravanthi Machcha; Sushrita Yerra; Sahil Gupta; Aishwarya Sahoo; Sharmin Sultana; Hong Yu; Zonghai Yao

arXiv:2601.12471·cs.CL·January 23, 2026

Knowing When to Abstain: Medical LLMs Under Clinical Uncertainty

Sravanthi Machcha, Sushrita Yerra, Sahil Gupta, Aishwarya Sahoo, Sharmin Sultana, Hong Yu, Zonghai Yao

PDF

Open Access 1 Video

TL;DR

This paper introduces MedAbstain, a benchmark for evaluating medical LLMs' ability to abstain under uncertainty, revealing current models' limitations and guiding safer deployment in critical settings.

Contribution

The paper presents MedAbstain, a new benchmark and evaluation protocol for assessing abstention in medical LLMs, emphasizing the importance of explicit abstention options for safety.

Findings

01

Explicit abstention options improve model uncertainty and safety.

02

Larger models and advanced prompts offer limited improvements.

03

Input perturbations are less effective than abstention options.

Abstract

Current evaluation of large language models (LLMs) overwhelmingly prioritizes accuracy; however, in real-world and safety-critical applications, the ability to abstain when uncertain is equally vital for trustworthy deployment. We introduce MedAbstain, a unified benchmark and evaluation protocol for abstention in medical multiple-choice question answering (MCQA) -- a discrete-choice setting that generalizes to agentic action selection -- integrating conformal prediction, adversarial question perturbations, and explicit abstention options. Our systematic evaluation of both open- and closed-source LLMs reveals that even state-of-the-art, high-accuracy models often fail to abstain with uncertain. Notably, providing explicit abstention options consistently increases model uncertainty and safer abstention, far more than input perturbations, while scaling model size or advanced prompting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Knowing When to Abstain: Medical LLMs Under Clinical Uncertainty· underline

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Machine Learning in Healthcare