VeriThinker: Learning to Verify Makes Reasoning Model Efficient
Zigeng Chen, Xinyin Ma, Gongfan Fang, Ruonan Yu, Xinchao Wang

TL;DR
VeriThinker introduces a verification-based fine-tuning method for large reasoning models, significantly reducing reasoning chain length and inference costs while maintaining or improving accuracy, including zero-shot generalization.
Contribution
The paper proposes a novel CoT compression approach by fine-tuning LRMs through an auxiliary verification task, avoiding synthetic data generation.
Findings
Reduces reasoning tokens by over 40% on MATH500 and AIME25 datasets.
Achieves slight accuracy improvements while decreasing inference costs.
Demonstrates zero-shot generalization to speculative reasoning tasks.
Abstract
Large Reasoning Models (LRMs) excel at complex tasks using Chain-of-Thought (CoT) reasoning. However, their tendency to overthinking leads to unnecessarily lengthy reasoning chains, dramatically increasing inference costs. To mitigate this issue, we introduce VeriThinker, a novel approach for CoT compression. Unlike conventional methods that fine-tune LRMs directly on the original reasoning task using synthetic concise CoT data, we innovatively fine-tune the model solely through an auxiliary verification task. By training LRMs to accurately verify the correctness of CoT solutions, the LRMs inherently become more discerning about the necessity of subsequent self-reflection steps, thereby effectively suppressing overthinking. Extensive experiments validate that VeriThinker substantially reduces reasoning chain lengths while maintaining or even slightly improving accuracy. When applied to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
