Quantifying and Mitigating Premature Closure in Frontier LLMs
Rebecca Handler, Suhana Bedi, Nigam Shah

TL;DR
This paper investigates premature closure in large language models, defining it as inappropriate commitment under uncertainty, and evaluates methods to reduce this issue in medical tasks.
Contribution
It introduces a formal definition of premature closure in LLMs and assesses safety prompts' effectiveness in mitigating it in medical applications.
Findings
Models frequently give answers even when the correct choice is missing.
Safety prompts reduce premature closure but do not eliminate it.
A significant portion of models' answers are inappropriate or unsafe.
Abstract
Premature closure, or committing to a conclusion before sufficient information is available, is a recognized contributor to diagnostic error but remains underexamined in large language models (LLMs). We define LLM premature closure as inappropriate commitment under uncertainty: providing an answer, recommendation, or clinical guidance when the safer response would be clarification, abstention, escalation, or refusal. We evaluated five frontier LLMs across structured and open-ended medical tasks. In MedQA (n = 500) and AfriMed-QA (n = 490) questions where the correct choice had been removed, models still selected an answer at high rates, with baseline false-action rates of 55-81% and 53-82%, respectively. In open-ended evaluation, models gave inappropriate answers on an average of 30% of 861 HealthBench questions and 78% of 191 physician-authored adversarial queries. Safety-oriented…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
