Not All Tokens Learn Alike: Attention Entropy Reveals Heterogeneous Signals in RL Reasoning

Gengyang Li; Zheng-Fan Wu; Siqi Bao; Yunfang Wu

arXiv:2605.07660·cs.CL·May 11, 2026

Not All Tokens Learn Alike: Attention Entropy Reveals Heterogeneous Signals in RL Reasoning

Gengyang Li, Zheng-Fan Wu, Siqi Bao, Yunfang Wu

PDF

TL;DR

This paper investigates how attention entropy reveals heterogeneous token signals in reinforcement learning for language models, identifying stable anchors and volatile explorers, and proposes an entropy-aware reweighting method to improve reasoning performance.

Contribution

It introduces the analysis of attention entropy to understand token-level RL signals, and develops an entropy-aware reweighting technique that enhances model reasoning accuracy.

Findings

01

Low-attention-entropy tokens (anchors) provide stable gradients but plateau on hard tasks.

02

High-attention-entropy tokens (explorers) induce volatile gradients but may contain useful signals.

03

Entropy-aware reweighting improves Qwen3-8B-Base's held-out performance from 34.39 to 37.40.

Abstract

Reinforcement-learning-based post-training has become a key approach for improving the reasoning ability of large language models, but its token-level learning signals remain poorly understood. This work studies their heterogeneity through attention entropy, which measures how concentrated or diffuse the contextual support is for each response token. We first show that token-level RL objectives are sparsely estimable: uniformly random 20 percent token subsets preserve much of the full-token held-out performance, suggesting substantial redundancy in token-level updates. However, entropy-structured subsets behave very differently. Low-attention-entropy tokens, which we call anchors, rely on concentrated support, produce stable gradients aligned with full-token updates, and provide a reliable optimization backbone, but tend to plateau on harder benchmarks. High-attention-entropy tokens,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.