Not All Tokens Are What You Need In Thinking

Hang Yuan; Bin Yu; Haotian Li; Shijun Yang; Christina Dan Wang; Zhou Yu; Xueyin Xu; Weizhen Qi; Kai Chen

arXiv:2505.17827·cs.CL·August 5, 2025

Not All Tokens Are What You Need In Thinking

Hang Yuan, Bin Yu, Haotian Li, Shijun Yang, Christina Dan Wang, Zhou Yu, Xueyin Xu, Weizhen Qi, Kai Chen

PDF

1 Repo

TL;DR

This paper introduces Conditional Token Selection (CTS), a method to compress reasoning chains in models by selecting only essential tokens, improving efficiency without sacrificing accuracy.

Contribution

The paper proposes CTS, a novel token-level compression framework that identifies and retains only crucial tokens in reasoning chains, reducing redundancy and computational costs.

Findings

01

CTS reduces reasoning tokens by up to 75.8%.

02

Models trained with CTS maintain high accuracy despite significant token reduction.

03

CTS improves inference efficiency while preserving reasoning performance.

Abstract

Modern reasoning models, such as OpenAI's o1 and DeepSeek-R1, exhibit impressive problem-solving capabilities but suffer from critical inefficiencies: high inference latency, excessive computational resource consumption, and a tendency toward overthinking -- generating verbose chains of thought (CoT) laden with redundant tokens that contribute minimally to the final answer. To address these issues, we propose Conditional Token Selection (CTS), a token-level compression framework with a flexible and variable compression ratio that identifies and preserves only the most essential tokens in CoT. CTS evaluates each token's contribution to deriving correct answers using conditional importance scoring, then trains models on compressed CoT. Extensive experiments demonstrate that CTS effectively compresses long CoT while maintaining strong reasoning performance. Notably, on the GPQA benchmark,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

faustrazor/not-all-thinking-tokens
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.