Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models

Doohyuk Jang; Yoonjeon Kim; Chanjae Park; Hyun Ryu; Eunho Yang

arXiv:2505.17225·cs.AI·May 26, 2025

Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models

Doohyuk Jang, Yoonjeon Kim, Chanjae Park, Hyun Ryu, Eunho Yang

PDF

2 Datasets 4 Reviews

TL;DR

This paper investigates reasoning rigidity in large language models, revealing their tendency to default to habitual reasoning despite explicit instructions, and introduces a diagnostic dataset to analyze and address this behavior.

Contribution

The paper introduces a diagnostic dataset, exttt{ extbf{ extbackslash dataset{}}}, to systematically study reasoning rigidity and identifies contamination patterns causing models to ignore instructions.

Findings

01

Models exhibit three contamination modes: Interpretation Overload, Input Distrust, Partial Instruction Attention.

02

The diagnostic set reveals recurring patterns of reasoning rigidity in models.

03

Public release of the dataset facilitates future mitigation research.

Abstract

Large language models have demonstrated remarkable proficiency in long and complex reasoning tasks. However, they frequently exhibit a problematic reliance on familiar reasoning patterns, a phenomenon we term \textit{reasoning rigidity}. Despite explicit instructions from users, these models often override clearly stated conditions and default to habitual reasoning trajectories, leading to incorrect conclusions. This behavior presents significant challenges, particularly in domains such as mathematics and logic puzzle, where precise adherence to specified constraints is critical. To systematically investigate reasoning rigidity, a behavior largely unexplored in prior work, we introduce a expert-curated diagnostic set, \dataset{}. Our dataset includes specially modified variants of existing mathematical benchmarks, namely AIME and MATH500, as well as well-known puzzles deliberately…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 4Confidence 4

Strengths

1. The notion of reasoning rigidity is interesting. A better interpretation could have been drawn from CBR. Nonetheless, this seems to be an useful failure mode to analyse 2. Authors introduce a benchmark to study the failure mode comprising of both logical and math problems. This seems useful. ReasoningTrap seems to be carefully structured to reveal subtle reasoning biases and contamination effects. 3. Experiments show some evidence to the existence of the rigidity problem.

Weaknesses

1. The phenomena, lies in the middle of instruction adherence and reasoning failure. It seems like a small subproblem. I am unsure how important or insightful it is to be studied separately. 2. Dataset creation overtly relies on GPTs. 3. Annotation procedures lack crucial details. 4. Certain findings (e.g., base models outperforming reasoning models) are neither deeply analyzed nor theoretically argued. 5. Some concerns with choices of the experimental setup (see below)

Reviewer 02Rating 4Confidence 3

Strengths

1.Strong motivation and problem insight – The paper starts from an important and underexplored observation: reasoning models sometimes overthink or overfit to familiar reasoning patterns, which leads to rigidity in solving slightly perturbed problems. This provides a good bench for evaluating the overthinking issue of reasoning models. 2.Well-designed diagnostic dataset. The proposed ReasoningTrap dataset is thoughtfully constructed and fills a clear gap in the community. It offers a systematic

Weaknesses

1. Conceptual boundary with overthinking. The notion of reasoning rigidity seems conceptually close to overthinking. It is unclear whether the proposed phenomenon is a specific manifestation of overthinking or a distinct failure mode. A clearer conceptual distinction or theoretical framing would strengthen the contribution. 2. Lack of difficulty-level analysis – The paper could analyze how reasoning rigidity varies with problem difficulty. For instance, are models more rigid on harder problems

Reviewer 03Rating 2Confidence 4

Strengths

1) The paper identifies and formalizes a meaningful yet underexplored problem in reasoning-model research—reasoning rigidity, where models override explicit user constraints and revert to habitual reasoning patterns. 2) This focus appears novel and practically relevant, as it highlights a distinct failure mode beyond hallucination or faithfulness that directly impacts instruction reliability in reasoning-intensive domains such as mathematics and logic puzzles. 3) The dataset and taxonomy p

Weaknesses

**W1. Ambiguous and Inconsistent Definition of “Reasoning Rigidity.”** The central concept—reasoning rigidity—was defined inconsistently across sections, which possibly creates some conceptual confusion. In the introduction, it refers broadly to models editing or ignoring user-given conditions, implying any failure to follow instructions. Later, it is subdivided into three “rigidity patterns” (Interpretation Overload, Input Distrust, Partial Instruction Attention), which mostly describe m

Reviewer 04Rating 4Confidence 3

Strengths

The phenomenon of reasoning rigidity is interesting and clearly explained. The proposed diagnostic set effectively show the disadvantages of the reasoning rigidity. The proposed mitigations solutions are effective.

Weaknesses

1. The diagnostic set construction process is not rigorous. For ConditionedMath, the construction full rely on LLMs, which may not be reliable. Since there are only 84 questions, humans should be able to verify them one by one. As for PuzzleTrivial, the construction process is not clear. Are they constructed by LLMs or human annotators? 2. In Section 4.1, the authors proposed relative cosine similarity to measure the contamination ratio without justification, which make the analysis questionable

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.