How Far Are We from Optimal Reasoning Efficiency?
Jiaxuan Gao, Shu Yan, Qixin Tan, Lu Yang, Shusheng Xu, Wei Fu, Zhiyu Mei, Kaifeng Lyu, Yi Wu

TL;DR
This paper introduces a new metric called REG to measure reasoning efficiency gaps in large reasoning models, proposes reinforcement learning algorithms to minimize this gap, and demonstrates significant efficiency improvements with minimal accuracy loss.
Contribution
It presents the reasoning efficiency frontiers, a unified metric REG, and the REO-RL algorithm to systematically improve reasoning efficiency in large models.
Findings
REG effectively captures accuracy-length trade-offs.
REO-RL reduces REG by over 50% across models.
Methods approach efficiency frontiers with minimal accuracy loss.
Abstract
Large Reasoning Models (LRMs) demonstrate remarkable problem-solving capabilities through extended Chain-of-Thought (CoT) reasoning but often produce excessively verbose and redundant reasoning traces. This inefficiency incurs high inference costs and limits practical deployment. While existing fine-tuning methods aim to improve reasoning efficiency, assessing their efficiency gains remains challenging due to inconsistent evaluations. In this work, we introduce the reasoning efficiency frontiers, empirical upper bounds derived from fine-tuning base LRMs across diverse approaches and training configurations. Based on these frontiers, we propose the Reasoning Efficiency Gap (REG), a unified metric quantifying deviations of any fine-tuned LRMs from these frontiers. Systematic evaluation on challenging mathematical benchmarks reveals significant gaps in current methods: they either…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Constraint Satisfaction and Optimization · Reinforcement Learning in Robotics
MethodsBalanced Selection · ALIGN · Sparse Evolutionary Training
