Knowing When to Quit: A Principled Framework for Dynamic Abstention in LLM Reasoning
Hen Davidov, Nachshon Cohen, Oren Kalinsky, Yaron Fairstein, Guy Kushilevitz, Ram Yazdi, Patrick Rebeschini

TL;DR
This paper introduces a formal, reinforcement learning-based framework for dynamic abstention in LLM reasoning, enabling early termination of unpromising reasoning traces to save compute and improve accuracy.
Contribution
It provides a principled, theoretically grounded method for dynamic mid-generation abstention, outperforming existing empirical approaches.
Findings
Abstaining when the value function is low improves accuracy and compute efficiency.
The proposed method outperforms natural baselines in mathematical reasoning and toxicity tasks.
A derived approximation of the value function is both principled and computationally efficient.
Abstract
LLMs utilizing chain-of-thought reasoning often waste substantial compute by producing long, incorrect responses. Abstention can mitigate this by withholding outputs unlikely to be correct. While most abstention methods decide to withhold outputs before or after generation, dynamic mid-generation abstention considers early termination of unpromising reasoning traces at each token position. Prior work has explored empirical variants of this idea, but principled guidance for the abstention rule remains lacking. We present a formal analysis of dynamic abstention for LLMs, modeling abstention as an explicit action within a regularized reinforcement learning framework. An abstention reward parameter controls the trade-off between compute and information. We show that abstaining when the value function falls below this reward strictly outperforms natural baselines under general conditions. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
