Thinkless: LLM Learns When to Think
Gongfan Fang, Xinyin Ma, Xinchao Wang

TL;DR
Thinkless enables LLMs to adaptively choose between concise and detailed reasoning modes, significantly improving efficiency by reducing unnecessary complex reasoning without sacrificing accuracy.
Contribution
This paper introduces a novel reinforcement learning framework with DeGRPO algorithm for LLMs to learn when to think, balancing reasoning depth and computational efficiency.
Findings
Reduces long-chain reasoning by 50-90% on benchmarks.
Maintains high accuracy while improving efficiency.
Demonstrates effective control over reasoning mode selection.
Abstract
Reasoning Language Models, capable of extended chain-of-thought reasoning, have demonstrated remarkable performance on tasks requiring complex logical inference. However, applying elaborate reasoning for all queries often results in substantial computational inefficiencies, particularly when many problems admit straightforward solutions. This motivates an open question: Can LLMs learn when to think? To answer this, we propose Thinkless, a learnable framework that empowers an LLM to adaptively select between short-form and long-form reasoning, based on both task complexity and the model's ability. Thinkless is trained under a reinforcement learning paradigm and employs two control tokens, <short> for concise responses and <think> for detailed reasoning. At the core of our method is a Decoupled Group Relative Policy Optimization (DeGRPO) algorithm, which decomposes the learning objective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks
