Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model

Bowen Ding; Yuhan Chen; Futing Wang; Lingfeng Ming; Tao Lin

arXiv:2506.23840·cs.CL·July 1, 2025

Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model

Bowen Ding, Yuhan Chen, Futing Wang, Lingfeng Ming, Tao Lin

PDF

Open Access

TL;DR

This paper investigates the overthinking dilemma in large reasoning models caused by thinking tokens, proposing a novel optimization algorithm to improve token efficiency and reasoning performance.

Contribution

It introduces DuP-PO, a new algorithm that reduces unnecessary thinking tokens and enhances reasoning efficiency in large models.

Findings

01

DuP-PO improves token efficiency on math reasoning benchmarks.

02

The method enhances reasoning performance while reducing overthinking.

03

Experimental results show significant gains over baseline models.

Abstract

Large Reasoning Models (LRMs) excel at solving complex problems but face an overthinking dilemma. When handling simple tasks, they often produce verbose responses overloaded with thinking tokens (e.g., wait, however). These tokens trigger unnecessary high-level reasoning behaviors like reflection and backtracking, reducing efficiency. In this work, our pilot study reveals that these thinking-token-induced behaviors are not essential for effective problem-solving and may even hinder correct reasoning within constrained token budgets. We identify this phenomenon as the thinking trap. To mitigate this issue, we propose Dual Policy Preference Optimization (DuP-PO), a novel algorithm featuring: (1) A rollout sampling strategy that guarantees balanced exposure to responses with and without thinking tokens; (2) A fine-grained advantage control technique to dynamically regulate the prediction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics · Intelligent Tutoring Systems and Adaptive Learning

MethodsBalanced Selection