Reconsidering Overthinking: Penalizing Internal and External Redundancy in CoT Reasoning

Jialiang Hong; Taihang Zhen; Kai Chen; Jiaheng Liu; Junlan Feng; Wenpeng Zhu; Jing Huo; Yang Gao; Depeng Wang; Haitao Wan; Xi Yang; Boyan Wang; Fanyu Meng; Yuyao Zhang

arXiv:2508.02178·cs.AI·January 7, 2026

Reconsidering Overthinking: Penalizing Internal and External Redundancy in CoT Reasoning

Jialiang Hong, Taihang Zhen, Kai Chen, Jiaheng Liu, Junlan Feng, Wenpeng Zhu, Jing Huo, Yang Gao, Depeng Wang, Haitao Wan, Xi Yang, Boyan Wang, Fanyu Meng, Yuyao Zhang

PDF

Open Access

TL;DR

This paper introduces a semantic-aware reinforcement learning approach to reduce internal and external redundancy in Chain-of-Thought reasoning traces, improving efficiency and interpretability without sacrificing accuracy.

Contribution

It proposes a dual-penalty framework targeting internal and external redundancy, with a sliding-window analysis and normalized metrics, enhancing reasoning trace conciseness and interpretability.

Findings

01

Significant compression of reasoning traces with minimal accuracy loss

02

External redundancy can be eliminated without performance impact

03

Internal redundancy removal requires careful calibration to preserve reasoning quality

Abstract

Large Reasoning Models (LRMs) often suffer from overthinking, generating verbose reasoning traces that compromise both computational efficiency and interpretability. Unlike prior efforts that rely on global length-based rewards, we propose a semantic-aware decomposition of redundancy into two distinct forms: internal redundancy (informational stagnation within the reasoning process) and external redundancy (superfluous continuation after the final answer). We introduce a dual-penalty reinforcement learning framework that surgically targets these inefficiencies: a sliding-window semantic analysis is employed to penalize low-gain steps within the reasoning trajectory, while a normalized metric suppresses the post-answer tail. Extensive experiments demonstrate that our method significantly compresses Chain-of-Thought traces with minimal accuracy degradation, while maintaining strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Topic Modeling