Beyond Uniform Credit Assignment: Selective Eligibility Traces for RLVR

Chaoli Mou; Zhan Zhuang; Xinning Chen; Yu Zhang

arXiv:2605.05965·cs.LG·May 8, 2026

Beyond Uniform Credit Assignment: Selective Eligibility Traces for RLVR

Chaoli Mou, Zhan Zhuang, Xinning Chen, Yu Zhang

PDF

TL;DR

This paper introduces Selective Eligibility Traces (S-trace), a novel method for fine-grained credit assignment in reinforcement learning with verifiable rewards, improving efficiency and performance over existing critic-free algorithms.

Contribution

It proposes S-trace, a sparse eligibility traces mechanism that enhances credit assignment precision and efficiency, and contextualizes GSPO within this framework.

Findings

01

S-trace outperforms GRPO on multiple Qwen models with up to 3.16% gains.

02

S-trace achieves higher sample and token efficiency.

03

S-trace maintains robust improvements when scaled to larger models.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has become a key approach for improving the reasoning abilities of large language models. However, widely used critic-free algorithms such as Group Relative Policy Optimization (GRPO) necessitate a ``uniform credit assignment'' assumption that indiscriminately broadcast trajectory-level advantages, hindering learning efficiency by failing to distinguish critical reasoning steps. To address this limitation, we propose Selective Eligibility Traces (S-trace). Grounded in the intuition of partial trust region preservation, we initially introduce P-trace as a sample-efficient, critic-free eligibility traces method, upon which we build S-trace, implementing a sparse eligibility traces mechanism to further mitigate variance and achieve fine-grained credit assignment by selectively masking low-entropy tokens. Theoretically, we contextualize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.