The Art of Efficient Reasoning: Data, Reward, and Optimization
Taiqiang Wu, Zenan Xu, Bo Zhou, Ngai Wong

TL;DR
This paper systematically investigates efficient reasoning in large language models, emphasizing reward shaping, length adaptation, and training strategies to improve reasoning accuracy while reducing computational costs.
Contribution
It provides a comprehensive analysis of the mechanics of efficient reasoning, including a two-stage training paradigm and practical guidelines validated across multiple models.
Findings
Maintaining positive reward density prevents the short-is-correct trap.
Learned length bias generalizes across domains and difficulty levels.
Extensive experiments validate the robustness of proposed strategies.
Abstract
Large Language Models (LLMs) consistently benefit from scaled Chain-of-Thought (CoT) reasoning, but also suffer from heavy computational overhead. To address this issue, efficient reasoning aims to incentivize short yet accurate thinking trajectories, typically through reward shaping with Reinforcement Learning (RL). In this paper, we systematically investigate the mechanics of efficient reasoning for LLMs. For comprehensive evaluation, we advocate for more fine-grained metrics, including length distribution conditioned on correctness and performance across a wide spectrum of token budgets ranging from 2k to 32k. First, we reveal that the training process follows a two-stage paradigm: length adaptation and reasoning refinement. Through extensive experiments (about 0.2 million GPU hours) in a unified protocol, we deconstruct training prompts and rollouts, reward shaping, and optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗taki555/Qwen3-0.6B-Artmodel· 71 dl71 dl
- 🤗taki555/Qwen3-1.7B-Artmodel· 87 dl87 dl
- 🤗taki555/Qwen3-4B-Instruct-2507-Artmodel· 72 dl72 dl
- 🤗taki555/Qwen3-30B-A3B-Instruct-2507-Artmodel· 78 dl78 dl
- 🤗taki555/Qwen3-4B-Thinking-2507-Artmodel· 101 dl101 dl
- 🤗taki555/Qwen3-30B-A3B-Thinking-2507-Artmodel· 105 dl105 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Graph Neural Networks
