ThinkLess: A Training-Free Inference-Efficient Method for Reducing Reasoning Redundancy

Gengyang Li; Yifeng Gao; Yuming Li; Yunfang Wu

arXiv:2505.15684·cs.CL·May 26, 2025

ThinkLess: A Training-Free Inference-Efficient Method for Reducing Reasoning Redundancy

Gengyang Li, Yifeng Gao, Yuming Li, Yunfang Wu

PDF

Open Access

TL;DR

ThinkLess is a training-free method that reduces reasoning redundancy in large language models by early termination of reasoning, significantly improving inference efficiency while maintaining answer quality.

Contribution

It introduces a novel early termination approach that leverages attention insights, inserting terminator tokens earlier to skip redundant reasoning without model fine-tuning.

Findings

01

Achieves comparable accuracy to full-length CoT decoding.

02

Reduces decoding time and memory consumption significantly.

03

Operates without fine-tuning or auxiliary data.

Abstract

While Chain-of-Thought (CoT) prompting improves reasoning in large language models (LLMs), the excessive length of reasoning tokens increases latency and KV cache memory usage, and may even truncate final answers under context limits. We propose ThinkLess, an inference-efficient framework that terminates reasoning generation early and maintains output quality without modifying the model. Atttention analysis reveals that answer tokens focus minimally on earlier reasoning steps and primarily attend to the reasoning terminator token, due to information migration under causal masking. Building on this insight, ThinkLess inserts the terminator token at earlier positions to skip redundant reasoning while preserving the underlying knowledge transfer. To prevent format discruption casued by early termination, ThinkLess employs a lightweight post-regulation mechanism, relying on the model's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Intelligent Tutoring Systems and Adaptive Learning

MethodsFocus