Beyond Majority Voting: Towards Fine-grained and More Reliable Reward Signal for Test-Time Reinforcement Learning

Weiqin Wang; Yile Wang; Kehao Chen; Hui Huang

arXiv:2512.15146·cs.CL·May 7, 2026

Beyond Majority Voting: Towards Fine-grained and More Reliable Reward Signal for Test-Time Reinforcement Learning

Weiqin Wang, Yile Wang, Kehao Chen, Hui Huang

PDF

1 Repo

TL;DR

This paper introduces SCOPE, a confidence-weighted pseudo-labeling framework for test-time reinforcement learning that improves reasoning and exploration, outperforming recent methods on multiple benchmarks.

Contribution

SCOPE integrates confidence estimation and dynamic subgroup partitioning to enhance pseudo-label quality and exploration in test-time reinforcement learning.

Findings

01

SCOPE achieves 13.1% relative improvement on AIME 2025.

02

SCOPE outperforms recent baselines across various benchmarks.

03

The code is publicly available at https://github.com/szu-tera/SCOPE.

Abstract

Test-time reinforcement learning mitigates the reliance on annotated data by using majority voting results as pseudo-labels, emerging as a complementary direction to reinforcement learning with verifiable rewards (RLVR) for improving reasoning ability. However, this voting strategy often induces confirmation bias and suffers from sparse rewards, limiting the overall performance. In this work, we propose subgroup-specific step-wise confidence-weighted pseudo-label estimation (SCOPE), a framework integrating model confidence and dynamic subgroup partitioning to address these issues. Specifically, SCOPE integrates the proposed step-wise confidence into pseudo label estimation, prioritizing high-quality reasoning paths over simple frequency count. Furthermore, it dynamically partitions the candidate outputs pool into independent subgroups by balancing reasoning quality against exploration…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

szu-tera/SCOPE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.