Abstain-R1: Calibrated Abstention and Post-Refusal Clarification via Verifiable RL
Skylar Zhai, Jingcheng Liang, Dongyeop Kang

TL;DR
This paper introduces Abstain-R1, a 3B model trained with a novel reward that enhances its ability to abstain from answering unanswerable questions and provide clarifications, improving reliability and interpretability.
Contribution
The paper proposes a clarification-aware RL reward for training models to abstain and clarify unanswerable queries, outperforming existing methods and matching larger systems.
Findings
Abstain-R1 improves abstention and clarification on unanswerable queries.
It maintains strong performance on answerable queries.
It achieves competitive unanswerable-query behavior with larger models.
Abstract
Reinforcement fine-tuning improves the reasoning ability of large language models, but it can also encourage them to answer unanswerable queries by guessing or hallucinating missing information. Existing abstention methods either train models to produce generic refusals or encourage follow-up clarifications without verifying whether those clarifications identify the key missing information. We study queries that are clear in meaning but cannot be reliably resolved from the given information, and argue that a reliable model should not only abstain, but also explain what is missing. We propose a clarification-aware RLVR reward that, while rewarding correct answers on answerable queries, jointly optimizes explicit abstention and semantically aligned post-refusal clarification on unanswerable queries. Using this reward, we train Abstain-R1, a 3B model that improves abstention and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
