Self-Verification Dilemma: Experience-Driven Suppression of Overused Checking in LLM Reasoning
Quanyu Long, Kai Jie Jiang, Jianda Chen, Xu Guo, Leilei Gan, Wenya Wang

TL;DR
This paper investigates the frequent but often unhelpful self-verification steps in large reasoning models and proposes a framework to suppress unnecessary rechecks, reducing token usage without sacrificing accuracy.
Contribution
It introduces an experience-driven test-time framework that detects and suppresses overused self-verification in large reasoning models, improving efficiency.
Findings
Reduces token usage up to 20.3% across benchmarks
Most self-verification steps are confirmatory, not corrective
Suppression maintains or improves model accuracy
Abstract
Large Reasoning Models (LRMs) achieve strong performance by generating long reasoning traces with reflection. Through a large-scale empirical analysis, we find that a substantial fraction of reflective steps consist of self-verification (recheck) that repeatedly confirm intermediate results. These rechecks occur frequently across models and benchmarks, yet the vast majority are confirmatory rather than corrective, rarely identifying errors and altering reasoning outcomes. This reveals a mismatch between how often self-verification is activated and how often it is actually useful. Motivated by this, we propose a novel, experience-driven test-time framework that reduces the overused verification. Our method detects the activation of recheck behavior, consults an offline experience pool of past verification outcomes, and estimates whether a recheck is likely unnecessary via efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Advanced Graph Neural Networks
