Rewards as Labels: Revisiting RLVR from a Classification Perspective

Zepeng Zhai; Meilin Chen; Jiaxuan Zhao; Junlang Qian; Lei Shen; Yuan Lu

arXiv:2602.05630·cs.LG·March 11, 2026

Rewards as Labels: Revisiting RLVR from a Classification Perspective

Zepeng Zhai, Meilin Chen, Jiaxuan Zhao, Junlang Qian, Lei Shen, Yuan Lu

PDF

Open Access

TL;DR

This paper introduces Rewards as Labels (REAL), a classification-based framework for reinforcement learning with verifiable rewards, improving training stability and performance over existing methods like GRPO.

Contribution

REAL reformulates policy optimization as a classification problem using rewards as categorical labels, addressing gradient misassignment and domination issues in RLVR.

Findings

01

REAL outperforms GRPO and variants on reasoning benchmarks.

02

REAL improves Pass@1 by 6.7% on 1.5B models.

03

Even with simple binary cross-entropy, REAL surpasses DAPO.

Abstract

Reinforcement Learning with Verifiable Rewards has recently advanced the capabilities of Large Language Models in complex reasoning tasks by providing explicit rule-based supervision. Among RLVR methods, GRPO and its variants have achieved strong empirical performance. Despite their success, we identify that they suffer from Gradient Misassignment in Positives and Gradient Domination in Negatives, which lead to inefficient and suboptimal policy updates. To address these issues, we propose Rewards as Labels (REAL), a novel framework that revisits verifiable rewards as categorical labels rather than scalar weights, thereby reformulating policy optimization as a classification problem. Building on this, we further introduce anchor logits to enhance policy learning. Our analysis reveals that REAL induces a monotonic and bounded gradient weighting, enabling balanced gradient allocation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Topic Modeling · Multimodal Machine Learning Applications