What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding

Ming Li; Zhengyuan Yang; Xiyao Wang; Dianqi Li; Kevin Lin; Tianyi Zhou; Lijuan Wang

arXiv:2506.06998·cs.CL·June 10, 2025

What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding

Ming Li, Zhengyuan Yang, Xiyao Wang, Dianqi Li, Kevin Lin, Tianyi Zhou, Lijuan Wang

PDF

Open Access

TL;DR

This paper analyzes the behavior of large reasoning models, identifies key misalignment phenomena, and introduces FoReaL-Decoding, a collaborative decoding method that reduces inference costs while maintaining high reasoning performance.

Contribution

It uncovers novel misalignment phenomena in reasoning models and proposes FoReaL-Decoding, a new decoding strategy that improves efficiency without sacrificing accuracy.

Findings

01

Reduces FLOPs by 30-50% on reasoning benchmarks

02

Cuts Chain of Thought length by up to 40%

03

Maintains 86-100% of model performance

Abstract

Large reasoning models (LRMs) achieve strong reasoning performance by emitting long chains of thought. Yet, these verbose traces slow down inference and often drift into unnecessary detail, known as the overthinking phenomenon. To better understand LRMs' behavior, we systematically analyze the token-level misalignment between reasoning and non-reasoning models. While it is expected that their primary difference lies in the stylistic "thinking cues", LRMs uniquely exhibit two pivotal, previously under-explored phenomena: a Global Misalignment Rebound, where their divergence from non-reasoning models persists or even grows as response length increases, and more critically, a Local Misalignment Diminish, where the misalignment concentrates at the "thinking cues" each sentence starts with but rapidly declines in the remaining of the sentence. Motivated by the Local Misalignment Diminish, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Constraint Satisfaction and Optimization