Not all tokens are needed(NAT): token efficient reinforcement learning

Hejian Sang; Yuanda Xu; Zhengze Zhou; Ran He; Zhipeng Wang

arXiv:2603.06619·cs.LG·March 10, 2026

Not all tokens are needed(NAT): token efficient reinforcement learning

Hejian Sang, Yuanda Xu, Zhengze Zhou, Ran He, Zhipeng Wang

PDF

Open Access

TL;DR

NAT introduces a method to reduce token usage in reinforcement learning for language models by selectively updating only a subset of tokens, maintaining performance while lowering computational costs.

Contribution

The paper proposes NAT, a novel framework that employs unbiased partial-token policy-gradient estimation to efficiently scale RL with long sequences.

Findings

01

NAT matches full-token RL performance with only 50% token updates.

02

RPC reduces GPU memory by 18% and training time by 29%.

03

NAT enables more scalable RL for long chain-of-thought trajectories.

Abstract

Reinforcement learning (RL) has become a key driver of progress in large language models, but scaling RL to long chain-of-thought (CoT) trajectories is increasingly constrained by backpropagation over every generated token. Even with optimized rollout engines, full-token updates can consume a large fraction of total training cost, turning token length into a hidden tax on RL. We introduce Not All Tokens Are Needed (NAT), a unified framework that makes the token budget a first-class optimization primitive. NAT updates the policy using only a selected subset of generated tokens while preserving the learning signal of full-sequence RL. The core idea is an unbiased partial-token policy-gradient estimator via Horvitz-Thompson reweighting, which ensures statistically correct gradients despite subsampling. We instantiate NAT with two simple, plug-and-play token selection schemes: Uniform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning in Materials Science