Incentivizing LLMs to Self-Verify Their Answers

Fuxiang Zhang; Jiacheng Xu; Chaojie Wang; Ce Cui; Yang Liu; Bo An

arXiv:2506.01369·cs.LG·October 31, 2025

Incentivizing LLMs to Self-Verify Their Answers

Fuxiang Zhang, Jiacheng Xu, Chaojie Wang, Ce Cui, Yang Liu, Bo An

PDF

Open Access 1 Video

TL;DR

This paper introduces a self-verification framework for LLMs that trains models to assess their own answers, leading to improved reasoning performance and effective test-time scaling without external reward models.

Contribution

The paper proposes a unified reinforcement learning approach enabling LLMs to self-verify answers, addressing distribution mismatch issues and enhancing reasoning accuracy.

Findings

01

Models trained with self-verification outperform baseline models.

02

Self-verification enables effective test-time scaling.

03

Approach generalizes across different reasoning tasks.

Abstract

Large Language Models (LLMs) have demonstrated remarkable progress in complex reasoning tasks through both post-training and test-time scaling laws. While prevalent test-time scaling approaches are often realized by using external reward models to guide the model generation process, we find that only marginal gains can be acquired when scaling a model post-trained on specific reasoning tasks. We identify that the limited improvement stems from distribution discrepancies between the specific post-trained generator and the general reward model. To address this, we propose a framework that incentivizes LLMs to self-verify their own answers. By unifying answer generation and verification within a single reinforcement learning (RL) process, we train models that can effectively assess the correctness of their own solutions. The trained model can further scale its performance at inference time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Incentivizing LLMs to Self-Verify Their Answers· slideslive

Taxonomy

TopicsDigital Rights Management and Security