When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs

Xiaomin Li; Zhou Yu; Zhiwei Zhang; Xupeng Chen; Ziji Zhang; Yingying Zhuang; Narayanan Sadagopan; Anurag Beniwal

arXiv:2505.11423·cs.CL·September 3, 2025

When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs

Xiaomin Li, Zhou Yu, Zhiwei Zhang, Xupeng Chen, Ziji Zhang, Yingying Zhuang, Narayanan Sadagopan, Anurag Beniwal

PDF

Open Access 1 Video

TL;DR

This paper reveals that chain-of-thought prompting can impair instruction-following accuracy in large language models, and proposes strategies to mitigate these negative effects, highlighting a previously overlooked pitfall in reasoning-enhanced LLMs.

Contribution

It systematically uncovers the negative impact of explicit reasoning on instruction-following and introduces practical mitigation techniques, including a novel attention-based metric.

Findings

01

CoT prompting often reduces instruction-following accuracy.

02

Selective reasoning strategies can recover performance.

03

Attention diversion explains reasoning-induced failures.

Abstract

Reasoning-enhanced large language models (RLLMs), whether explicitly trained for reasoning or prompted via chain-of-thought (CoT), have achieved state-of-the-art performance on many complex reasoning tasks. However, we uncover a surprising and previously overlooked phenomenon: explicit CoT reasoning can significantly degrade instruction-following accuracy. Evaluating 15 models on two benchmarks: IFEval (with simple, rule-verifiable constraints) and ComplexBench (with complex, compositional constraints), we consistently observe performance drops when CoT prompting is applied. Through large-scale case studies and an attention-based analysis, we identify common patterns where reasoning either helps (e.g., with formatting or lexical precision) or hurts (e.g., by neglecting simple constraints or introducing unnecessary content). We propose a metric, constraint attention, to quantify model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Text Readability and Simplification

MethodsSoftmax · Attention Is All You Need · Chain-of-thought prompting · Focus