Learning to Self-Verify Makes Language Models Better Reasoners

Yuxin Chen; Yu Wang; Yi Zhang; Ziang Ye; Zhengzhou Cai; Yaorui Shi; Qi Gu; Hui Su; Xunliang Cai; Xiang Wang; An Zhang; Tat-Seng Chua

arXiv:2602.07594·cs.CL·February 10, 2026

Learning to Self-Verify Makes Language Models Better Reasoners

Yuxin Chen, Yu Wang, Yi Zhang, Ziang Ye, Zhengzhou Cai, Yaorui Shi, Qi Gu, Hui Su, Xunliang Cai, Xiang Wang, An Zhang, Tat-Seng Chua

PDF

Open Access

TL;DR

This paper investigates the asymmetry between generation and self-verification in large language models, revealing that training models to self-verify can enhance their reasoning and generation abilities.

Contribution

It introduces a multi-task reinforcement learning framework that jointly optimizes generation and self-verification, improving reasoning performance.

Findings

01

Self-verification training improves generation accuracy.

02

Models with integrated self-verification outperform generation-only models.

03

Self-verification enhances reasoning trace efficiency.

Abstract

Recent large language models (LLMs) achieve strong performance in generating promising reasoning paths for complex tasks. However, despite powerful generation ability, LLMs remain weak at verifying their own answers, revealing a persistent capability asymmetry between generation and self-verification. In this work, we conduct an in-depth investigation of this asymmetry throughout training evolution and show that, even on the same task, improving generation does not lead to corresponding improvements in self-verification. Interestingly, we find that the reverse direction of this asymmetry behaves differently: learning to self-verify can effectively improve generation performance, achieving accuracy comparable to standard generation training while yielding more efficient and effective reasoning traces. Building on this observation, we further explore integrating self-verification into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)