From Absolute to Relative: Rethinking Reward Shaping in Group-Based Reinforcement Learning

Wenzhe Niu; Wei He; Zongxia Xie; Jinpeng Ou; Huichuan Fan; Yuchen Ge; Yanru Sun; Ziyin Wang; Yizhao Sun; Chengshun Shi; Jiuchong Gao; Jinghua Hao; Renqing He

arXiv:2601.23058·cs.LG·February 2, 2026

From Absolute to Relative: Rethinking Reward Shaping in Group-Based Reinforcement Learning

Wenzhe Niu, Wei He, Zongxia Xie, Jinpeng Ou, Huichuan Fan, Yuchen Ge, Yanru Sun, Ziyin Wang, Yizhao Sun, Chengshun Shi, Jiuchong Gao, Jinghua Hao, Renqing He

PDF

Open Access

TL;DR

This paper introduces RLRR, a reward shaping framework that shifts from absolute to relative rewards in group-based reinforcement learning, improving robustness and performance in reasoning and open-ended tasks.

Contribution

The paper proposes RLRR and the Ranking Reward Model, enabling relative reward signals to address sparsity and instability issues in group-based reinforcement learning.

Findings

01

RLRR improves performance over standard baselines.

02

The Ranking Reward Model effectively generates relative rankings.

03

Enhanced robustness in reasoning and open-ended tasks.

Abstract

Reinforcement learning has become a cornerstone for enhancing the reasoning capabilities of Large Language Models, where group-based approaches such as GRPO have emerged as efficient paradigms that optimize policies by leveraging intra-group performance differences. However, these methods typically rely on absolute numerical rewards, introducing intrinsic limitations. In verifiable tasks, identical group evaluations often result in sparse supervision, while in open-ended scenarios, the score range instability of reward models undermines advantage estimation based on group means. To address these limitations, we propose Reinforcement Learning with Relative Rewards (RLRR), a framework that shifts reward shaping from absolute scoring to relative ranking. Complementing this framework, we introduce the Ranking Reward Model, a listwise preference model tailored for group-based optimization to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications