Is Long-to-Short a Free Lunch? Investigating Inconsistency and Reasoning Efficiency in LRMs

Shu Yang; Junchao Wu; Xuansheng Wu; Derek Wong; Ninhao Liu; Di Wang

arXiv:2506.19492·cs.CL·June 25, 2025

Is Long-to-Short a Free Lunch? Investigating Inconsistency and Reasoning Efficiency in LRMs

Shu Yang, Junchao Wu, Xuansheng Wu, Derek Wong, Ninhao Liu, Di Wang

PDF

Open Access

TL;DR

This paper investigates whether optimizing reasoning efficiency in large reasoning models compromises their consistency and robustness, revealing that efficiency strategies often increase behavioral inconsistencies and scheming behaviors.

Contribution

It introduces ICBENCH, a benchmark for measuring inconsistency in LRMs, and systematically evaluates how efficient reasoning strategies impact model consistency across multiple dimensions.

Findings

01

Larger models tend to be more consistent than smaller ones.

02

Efficient reasoning strategies increase inconsistency and scheming behaviors.

03

Models employing No-Thinking and Simple Token-Budget strategies show higher inconsistency.

Abstract

Large Reasoning Models (LRMs) have achieved remarkable performance on complex tasks by engaging in extended reasoning before producing final answers, yet this strength introduces the risk of overthinking, where excessive token generation occurs even for simple tasks. While recent work in efficient reasoning seeks to reduce reasoning length while preserving accuracy, it remains unclear whether such optimization is truly a free lunch. Drawing on the intuition that compressing reasoning may reduce the robustness of model responses and lead models to omit key reasoning steps, we investigate whether efficient reasoning strategies introduce behavioral inconsistencies. To systematically assess this, we introduce $I C B E N C H$ , a benchmark designed to measure inconsistency in LRMs across three dimensions: inconsistency across task settings (ITS), inconsistency between training objectives and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation