DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards

Kaiyi Zhang; Wei Wu; Yankai Lin

arXiv:2605.21467·cs.LG·May 21, 2026

DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards

Kaiyi Zhang, Wei Wu, Yankai Lin

PDF

1 Repo

TL;DR

This paper introduces DelTA, a novel method for token-level credit assignment in reinforcement learning from verifiable rewards, enhancing the interpretability and effectiveness of policy updates in large language models.

Contribution

DelTA estimates token coefficients to amplify discriminative token-gradient directions, improving reward-based learning in language models beyond existing centroid-based methods.

Findings

01

DelTA outperforms strong baselines on mathematical benchmarks.

02

It improves code generation and out-of-domain performance.

03

DelTA enhances the contrastiveness of RLVR updates.

Abstract

Reinforcement learning from verifiable rewards (RLVR) has emerged as a central technique for improving the reasoning capabilities of large language models. Despite its effectiveness, how response-level rewards translate into token-level probability changes remains poorly understood. We introduce a discriminator view of RLVR updates, showing that the policy-gradient update direction implicitly acts as a linear discriminator over token-gradient vectors and thereby determines which token probabilities are increased or decreased during learning. Under standard sequence-level RLVR, this discriminator is constructed from positive- and negative-side centroids formed by advantage-weighted averaging of token-gradient vectors. However, such centroid construction can be dominated by shared high-frequency patterns, such as formatting tokens, diluting sparse yet discriminative directions that better…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rucbm/DelTA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.