The Cancellation Hypothesis in Critic-Free RL: From Outcome Rewards to Token Credits
Tianhao Cheng, Zeyu Huang, Zihan Qiu, Yu Cheng, Edoardo Ponti, Yinghui Xu, Ivan Titov, Zenglin Xu

TL;DR
This paper investigates critic-free reinforcement learning for language models, revealing token-level interactions and proposing the cancellation hypothesis to explain how rewards influence token probabilities.
Contribution
It introduces the cancellation hypothesis, explaining token credit assignment through coupling effects, and proposes batching interventions to improve critic-free RL training.
Findings
Token-flipping phenomenon shows similar token probability changes in positive and negative rollouts.
Tokens associated with positive rollouts have higher value than those with negative ones.
Batching interventions based on cancellation hypothesis improve RL training across model scales.
Abstract
A commonly accepted explanation of critic-free RL for LLMs, based on sequence-level rewards, is that it reinforces successful rollouts with a positive advantage while penalizing failed ones. In contrast, we study critic-free RL from a token-level perspective, revealing the token-flipping phenomenon: positive and negative rollouts exhibit remarkably similar proportions of tokens whose probabilities are boosted or suppressed during RL training. To explain this phenomenon, we further show that a token's change in probability is not fully determined by its own advantage; coupled gradient interactions with other tokens also play a non-negligible role. Specifically, these token coupling effects occur primarily between identical tokens that are both predicted with low confidence. Building upon this analysis, we propose the cancellation hypothesis: as a result of coupling, opposing signals…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
