Design Conditions for Intra-Group Learning of Sequence-Level Rewards: Token Gradient Cancellation

Fei Ding; Yongkang Zhang; youwei wang; Zijian Zeng

arXiv:2604.13088·cs.LG·April 16, 2026

Design Conditions for Intra-Group Learning of Sequence-Level Rewards: Token Gradient Cancellation

Fei Ding, Yongkang Zhang, youwei wang, Zijian Zeng

PDF

TL;DR

This paper identifies a necessary condition for intra-group learning algorithms in reinforcement learning, emphasizing gradient exchangeability to prevent reward-irrelevant drift, and proposes transformations to improve training stability and performance.

Contribution

It introduces a design condition based on gradient exchangeability for intra-group objectives and proposes transformations to enforce it, enhancing training stability and efficiency.

Findings

01

Transformations restore gradient cancellation in token space.

02

Training stability and sample efficiency are improved.

03

Final model performance is enhanced.

Abstract

In sparse termination rewards, intra-group comparisons have become the dominant paradigm for fine-tuning reasoning models via reinforcement learning. However, long-term training often leads to issues like ineffective update accumulation (learning tax), solution probability drift, and entropy collapse. This paper presents a necessary condition for algorithm design from a token-level credit assignment perspective: to prevent reward-irrelevant drift, intra-group objectives must maintain gradient exchangeability across token updates, enabling gradient cancellation on weak-credit/high-frequency tokens. We show that two common mechanisms disrupting exchangeability make "non-cancellation" a structural norm. Based on this, we propose minimal intra-group transformations to restore or approximate the cancellation structure in the shared token space. Experimental results demonstrate that these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.