Knowing When Not to Answer: Abstention-Aware Scientific Reasoning
Samir Abdaljalil, Erchin Serpedin, Hasan Kurban

TL;DR
This paper introduces an abstention-aware framework for scientific reasoning with language models, emphasizing the importance of abstaining when evidence is insufficient to avoid harmful unsupported conclusions.
Contribution
It proposes a novel abstention-aware verification framework that improves scientific claim evaluation by selectively abstaining based on evidence, applicable across diverse models and benchmarks.
Findings
Confidence-based abstention reduces error risk significantly.
Abstention plays a crucial role in controlling model errors.
Model accuracy varies modestly, but abstention improves reliability.
Abstract
Large language models are increasingly used to answer and verify scientific claims, yet existing evaluations typically assume that a model must always produce a definitive answer. In scientific settings, however, unsupported or uncertain conclusions can be more harmful than abstaining. We study this problem through an abstention-aware verification framework that decomposes scientific claims into minimal conditions, audits each condition against available evidence using natural language inference (NLI), and selectively decides whether to support, refute, or abstain. We evaluate this framework across two complementary scientific benchmarks: SciFact and PubMedQA, covering both closed-book and open-domain evidence settings. Experiments are conducted with six diverse language models, including encoder-decoder, open-weight chat models, and proprietary APIs. Across all benchmarks and models,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications
