KTAE: A Model-Free Algorithm to Key-Tokens Advantage Estimation in Mathematical Reasoning

Wei Sun; Wen Yang; Pu Jian; Qianlong Du; Fuwei Cui; Shuo Ren; Jiajun Zhang

arXiv:2505.16826·cs.AI·November 18, 2025

KTAE: A Model-Free Algorithm to Key-Tokens Advantage Estimation in Mathematical Reasoning

Wei Sun, Wen Yang, Pu Jian, Qianlong Du, Fuwei Cui, Shuo Ren, Jiajun Zhang

PDF

Open Access 1 Repo 4 Models

TL;DR

This paper introduces KTAE, a new model-free algorithm that provides fine-grained token-level advantage estimates in mathematical reasoning, improving reinforcement learning performance without additional models.

Contribution

KTAE offers a novel, model-free method for token-level advantage estimation, addressing granularity issues in existing reinforcement learning algorithms for language models.

Findings

01

Models with KTAE outperform baselines on five reasoning benchmarks.

02

KTAE achieves higher accuracy with shorter responses.

03

Surpasses R1-Distill-Qwen-1.5B with the same base model.

Abstract

Recent advances have demonstrated that integrating reinforcement learning with rule-based rewards can significantly enhance the reasoning capabilities of large language models, even without supervised fine-tuning. However, prevalent reinforcement learning algorithms such as GRPO and its variants like DAPO, suffer from a coarse granularity issue when computing the advantage. Specifically, they compute rollout-level advantages that assign identical values to every token within a sequence, failing to capture token-specific contributions and hindering effective learning. To address this limitation, we propose Key-token Advantage Estimation (KTAE) - a novel algorithm that estimates fine-grained, token-level advantages without introducing additional models. KTAE leverages the correctness of sampled rollouts and applies statistical analysis to quantify the importance of individual tokens…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiaolizh1/ktae
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Topic Modeling · Natural Language Processing Techniques

MethodsDialogue-Adaptive Pre-training Objective · Balanced Selection