Tackling Length Inflation Without Trade-offs: Group Relative Reward Rescaling for Reinforcement Learning

Zichao Li; Jie Lou; Fangchen Dong; Zhiyuan Fan; Mengjie Ren; Hongyu Lin; Xianpei Han; Debing Zhang; Le Sun; Yaojie Lu; Xing Yu

arXiv:2603.10535·cs.LG·March 12, 2026

Tackling Length Inflation Without Trade-offs: Group Relative Reward Rescaling for Reinforcement Learning

Zichao Li, Jie Lou, Fangchen Dong, Zhiyuan Fan, Mengjie Ren, Hongyu Lin, Xianpei Han, Debing Zhang, Le Sun, Yaojie Lu, Xing Yu

PDF

Open Access

TL;DR

This paper introduces Group Relative Reward Rescaling (GR³), a novel method to control length inflation in reinforcement learning for language models, maintaining performance while reducing verbosity.

Contribution

The paper proposes GR³, a generalized, reward-dependent gating mechanism with regularization and calibration to effectively mitigate length inflation without loss of training quality.

Findings

01

GR³ effectively reduces length inflation in RLHF and RLVR settings.

02

Maintains comparable training dynamics and downstream performance to standard methods.

03

Outperforms existing length-regularized baselines.

Abstract

Reinforcement learning significantly enhances LLM capabilities but suffers from a critical issue: length inflation, where models adopt verbosity or inefficient reasoning to maximize rewards. Prior approaches struggle to address this challenge in a general and lossless manner, primarily because additive penalties introduce a compensatory effect that creates optimization shortcuts, while heuristic gating strategies lack generality beyond binary feedback. To bridge this gap, we present Group Relative Reward Rescaling (GR $^{3}$ ), which reframes length control as a multiplicative rescaling paradigm, effectively establishing a generalized, continuous, and reward-dependent gating mechanism. To further ensure lossless optimization, we incorporate group-relative regularization and advantage-aware calibration, which dynamically adapt length budgets to instance difficulty and preserve the advantage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning