VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction

Xin-Qiang Cai; Masashi Sugiyama

arXiv:2602.12579·cs.LG·February 16, 2026

VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction

Xin-Qiang Cai, Masashi Sugiyama

PDF

Open Access

TL;DR

This paper introduces VI-CuRL, a verifier-independent reinforcement learning framework that uses model confidence to stabilize training and improve reasoning in large language models, eliminating reliance on external verifiers.

Contribution

We develop a confidence-guided curriculum reinforcement learning method that reduces variance and enhances stability without external verifiers, supported by theoretical guarantees and empirical results.

Findings

01

VI-CuRL outperforms verifier-independent baselines on six benchmarks.

02

The method reduces gradient variance and prevents training collapse.

03

The estimator is proven to be asymptotically unbiased.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a dominant paradigm for enhancing Large Language Models (LLMs) reasoning, yet its reliance on external verifiers limits its scalability. Recent findings suggest that RLVR primarily functions by eliciting latent capabilities, motivating the development of verifier-free algorithms. However, in such settings, standard methods like Group Relative Policy Optimization face a critical challenge: destructive gradient variance that often leads to training collapse. To address this issue, we introduceVerifier-Independent Curriculum Reinforcement Learning (VI-CuRL), a framework that leverages the model's intrinsic confidence to construct a curriculum independent from external verifiers. By prioritizing high-confidence samples, VI-CuRL effectively manages the bias-variance trade-off, specifically targeting the reduction of action…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)