CoVerRL: Breaking the Consensus Trap in Label-Free Reasoning via Generator-Verifier Co-Evolution

Teng Pan; Yuchen Yan; Zixuan Wang; Ruiqing Zhang; Guiyang Hou; Wenqi Zhang; Weiming Lu; Jun Xiao; and Yongliang Shen

arXiv:2603.17775·cs.CL·March 24, 2026

CoVerRL: Breaking the Consensus Trap in Label-Free Reasoning via Generator-Verifier Co-Evolution

Teng Pan, Yuchen Yan, Zixuan Wang, Ruiqing Zhang, Guiyang Hou, Wenqi Zhang, Weiming Lu, Jun Xiao, and Yongliang Shen

PDF

Open Access

TL;DR

CoVerRL introduces a co-evolution framework where a generator and verifier iteratively improve each other to overcome the consensus trap in label-free reasoning, significantly enhancing reasoning accuracy and self-verification in large language models.

Contribution

The paper proposes a novel generator-verifier co-evolution approach that mitigates the consensus trap in label-free reinforcement learning for reasoning tasks.

Findings

01

Outperforms label-free baselines by 4.7-5.9% on reasoning benchmarks.

02

Self-verification accuracy improves from 55% to over 85%.

03

Demonstrates effective co-evolution of reasoning and verification capabilities.

Abstract

Label-free reinforcement learning enables large language models to improve reasoning capabilities without ground-truth supervision, typically by treating majority-voted answers as pseudo-labels. However, we identify a critical failure mode: as training maximizes self-consistency, output diversity collapses, causing the model to confidently reinforce systematic errors that evade detection. We term this the consensus trap. To escape it, we propose CoVerRL, a framework where a single model alternates between generator and verifier roles, with each capability bootstrapping the other. Majority voting provides noisy but informative supervision for training the verifier, while the improving verifier progressively filters self-consistent errors from pseudo-labels. This co-evolution creates a virtuous cycle that maintains high reward accuracy throughout training. Experiments across Qwen and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques