Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
Xiaoyuan Liu, Tian Liang, Zhiwei He, Jiahao Xu, Wenxuan Wang, Pinjia He, Zhaopeng Tu, Haitao Mi, Dong Yu

TL;DR
This paper introduces RISE, an online reinforcement learning framework that enhances large language models' problem-solving and self-verification abilities simultaneously, leading to more accurate and self-aware reasoning in mathematical tasks.
Contribution
The paper presents RISE, a novel integrated RL approach that trains LLMs to improve reasoning and self-verification concurrently using verifiable rewards.
Findings
RISE improves problem-solving accuracy on mathematical benchmarks.
Models exhibit more frequent and accurate self-verification behaviors.
Online verification and increased verification compute enhance model robustness.
Abstract
Large Language Models (LLMs) show great promise in complex reasoning, with Reinforcement Learning with Verifiable Rewards (RLVR) being a key enhancement strategy. However, a prevalent issue is ``superficial self-reflection'', where models fail to robustly verify their own outputs. We introduce RISE (Reinforcing Reasoning with Self-Verification), a novel online RL framework designed to tackle this. RISE explicitly and simultaneously trains an LLM to improve both its problem-solving and self-verification abilities within a single, integrated RL process. The core mechanism involves leveraging verifiable rewards from an outcome verifier to provide on-the-fly feedback for both solution generation and self-verification tasks. In each iteration, the model generates solutions, then critiques its own on-policy generated solutions, with both trajectories contributing to the policy update.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics
