Dual Consensus: Escaping from Spurious Majority in Unsupervised RLVR via Two-Stage Vote Mechanism

Kaixuan Du; Meng Cao; Hang Zhang; Yukun Wang; Xiangzhou Huang; Ni Li

arXiv:2603.16223·cs.LG·March 18, 2026

Dual Consensus: Escaping from Spurious Majority in Unsupervised RLVR via Two-Stage Vote Mechanism

Kaixuan Du, Meng Cao, Hang Zhang, Yukun Wang, Xiangzhou Huang, Ni Li

PDF

Open Access

TL;DR

This paper introduces Dual Consensus Reinforcement Learning (DCRL), a self-supervised method that improves large language model reasoning by using a two-stage consensus mechanism to generate reliable training signals without external supervision.

Contribution

DCRL is a novel two-stage self-supervised training approach that mitigates spurious majority answers and enhances reasoning performance in large language models.

Findings

01

Consistently improves Pass@1 over majority vote across eight benchmarks.

02

Yields more stable training dynamics.

03

Establishes a scalable path for label-free reasoning enhancement.

Abstract

Current label-free RLVR approaches for large language models (LLMs), such as TTRL and Self-reward, have demonstrated effectiveness in improving the performance of LLMs on complex reasoning tasks. However, these methods rely heavily on accurate pseudo-label estimation and converge on spurious yet popular answers, thereby trapping in a dominant mode and limiting further improvements. Building on this, we propose Dual Consensus Reinforcement Learning (DCRL), a novel self-supervised training method which is capable of generating more reliable learning signals through a two-stage consensus mechanism. The model initially acts as an anchor, producing dominant responses; then it serves as an explorer, generating diverse auxiliary signals via a temporary unlearning process. The final training target is derived from the harmonic mean of these two signal sets. Notably, the process operates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks