The Art of Efficient Reasoning: Data, Reward, and Optimization

Taiqiang Wu; Zenan Xu; Bo Zhou; Ngai Wong

arXiv:2602.20945·cs.CL·March 23, 2026

The Art of Efficient Reasoning: Data, Reward, and Optimization

Taiqiang Wu, Zenan Xu, Bo Zhou, Ngai Wong

PDF

Open Access 6 Models 1 Datasets

TL;DR

This paper systematically investigates efficient reasoning in large language models, emphasizing reward shaping, length adaptation, and training strategies to improve reasoning accuracy while reducing computational costs.

Contribution

It provides a comprehensive analysis of the mechanics of efficient reasoning, including a two-stage training paradigm and practical guidelines validated across multiple models.

Findings

01

Maintaining positive reward density prevents the short-is-correct trap.

02

Learned length bias generalizes across domains and difficulty levels.

03

Extensive experiments validate the robustness of proposed strategies.

Abstract

Large Language Models (LLMs) consistently benefit from scaled Chain-of-Thought (CoT) reasoning, but also suffer from heavy computational overhead. To address this issue, efficient reasoning aims to incentivize short yet accurate thinking trajectories, typically through reward shaping with Reinforcement Learning (RL). In this paper, we systematically investigate the mechanics of efficient reasoning for LLMs. For comprehensive evaluation, we advocate for more fine-grained metrics, including length distribution conditioned on correctness and performance across a wide spectrum of token budgets ranging from 2k to 32k. First, we reveal that the training process follows a two-stage paradigm: length adaptation and reasoning refinement. Through extensive experiments (about 0.2 million GPU hours) in a unified protocol, we deconstruct training prompts and rollouts, reward shaping, and optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

taki555/DeepScaleR-Easy
dataset· 22 dl
22 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Graph Neural Networks