TL;DR
This paper identifies and addresses the failure of large reasoning models to abstain appropriately on unanswerable questions, proposing a method to improve their trustworthiness without sacrificing reasoning accuracy.
Contribution
The paper systematically analyzes abstention failures in LRMs and introduces a lightweight two-stage approach to enhance their ability to recognize and abstain from unanswerable questions.
Findings
Significant improvement in abstention rate
Maintains reasoning performance
Reveals misalignment between cognition and response
Abstract
Large reasoning models (LRMs) have shown remarkable progress on complex reasoning tasks. However, some questions posed to LRMs are inherently unanswerable, such as math problems lacking sufficient conditions. We find that LRMs continually fail to provide appropriate abstentions when confronted with these unanswerable questions. In this paper, we systematically analyze, investigate, and resolve this issue for trustworthy AI. We first conduct a detailed analysis of the distinct response behaviors of LRMs when facing unanswerable questions. Then, we show that LRMs possess sufficient cognitive capabilities to recognize the flaws in these questions. However, they fail to exhibit appropriate abstention behavior, revealing a misalignment between their internal cognition and external response. Finally, to resolve this issue, we propose a lightweight, two-stage method that combines cognitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
