Measuring and curing reasoning rigidity: from decorative chain-of-thought to genuine faithfulness

Abhinaba Basu; Pavan Chakraborty

arXiv:2603.22816·cs.CL·April 14, 2026

Measuring and curing reasoning rigidity: from decorative chain-of-thought to genuine faithfulness

Abhinaba Basu, Pavan Chakraborty

PDF

TL;DR

This paper introduces the SLRC metric and LC-CoSR training method to measure and reduce reasoning rigidity in language models, improving faithfulness and robustness.

Contribution

It proposes a new metric for reasoning rigidity, a training method with stability guarantees, and evaluates models revealing the impact of RL-based training on reasoning faithfulness.

Findings

01

OpenAI's o4-mini has the highest SLRC among evaluated models.

02

RL-based reasoning training influences faithfulness more than thinking tokens.

03

High-SLRC models are more susceptible to sycophancy, leading to the RIS metric.

Abstract

Language models increasingly show their work by writing step-by-step reasoning before answering. But are these steps genuinely used, or is the answer rigid - fixed before reasoning begins? We introduce the Step-Level Reasoning Capacity (SLRC) metric and prove it is a consistent causal estimator (Theorem 1). We propose LC-CoSR, a training method with Lyapunov stability guarantees that directly reduces rigidity. Evaluating 16 frontier models (o4-mini, GPT-5.4, Claude Opus, Grok-4, DeepSeek-R1, Gemini 2.5 Pro, and others) across six domains at N=133-500, we find reasoning falls into three modes. OpenAI's o4-mini shows 74-88% step necessity on five of six tasks (73.8-88.3%) - the highest SLRC in our study. The critical differentiator is RL-based reasoning training, not thinking tokens: Grok-4's reasoning mode shows lower faithfulness than its non-reasoning mode (1.4% vs 7.2% necessity).…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.