ClinDet-Bench: Beyond Abstention, Evaluating Judgment Determinability of LLMs in Clinical Decision-Making

Yusuke Watanabe; Yohei Kobashi; Takeshi Kojima; Yusuke Iwasawa; Yasushi Okuno; Yutaka Matsuo

arXiv:2602.22771·cs.AI·February 27, 2026

ClinDet-Bench: Beyond Abstention, Evaluating Judgment Determinability of LLMs in Clinical Decision-Making

Yusuke Watanabe, Yohei Kobashi, Takeshi Kojima, Yusuke Iwasawa, Yasushi Okuno, Yutaka Matsuo

PDF

Open Access

TL;DR

ClinDet-Bench is a new benchmark for assessing whether large language models can recognize when they have enough information to make clinical decisions, addressing a critical safety aspect in high-stakes medical applications.

Contribution

This paper introduces ClinDet-Bench, a benchmark that evaluates LLMs' ability to determine information sufficiency in clinical decision-making scenarios.

Findings

01

LLMs often fail to identify determinability, leading to premature judgments or unnecessary abstention.

02

Existing benchmarks do not adequately assess LLM safety in clinical contexts.

03

ClinDet-Bench offers a framework for evaluating and improving LLM judgment reliability.

Abstract

Clinical decisions are often required under incomplete information. Clinical experts must identify whether available information is sufficient for judgment, as both premature conclusion and unnecessary abstention can compromise patient safety. To evaluate this capability of large language models (LLMs), we developed ClinDet-Bench, a benchmark based on clinical scoring systems that decomposes incomplete-information scenarios into determinable and undeterminable conditions. Identifying determinability requires considering all hypotheses about missing information, including unlikely ones, and verifying whether the conclusion holds across them. We find that recent LLMs fail to identify determinability under incomplete information, producing both premature judgments and excessive abstention, despite correctly explaining the underlying scoring knowledge and performing well under complete…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Topic Modeling · Clinical Reasoning and Diagnostic Skills