Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

A. Lawsen

arXiv:2506.09250·cs.AI·June 18, 2025·2 cites

Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

A. Lawsen

PDF

Open Access

TL;DR

This paper critiques prior claims of reasoning model failures by identifying experimental design flaws, demonstrating that models can perform well when proper evaluation methods are used, emphasizing the importance of careful experimental setup.

Contribution

The paper reveals critical issues in previous reasoning model evaluations and proposes improved evaluation strategies that better reflect models' true reasoning capabilities.

Findings

01

Models perform well on Tower of Hanoi when output constraints are managed.

02

Evaluation frameworks can misclassify reasoning failures due to practical constraints.

03

Unsolvable problem instances in benchmarks can lead to incorrect failure assessments.

Abstract

Shojaee et al. (2025) report that Large Reasoning Models (LRMs) exhibit "accuracy collapse" on planning puzzles beyond certain complexity thresholds. We demonstrate that their findings primarily reflect experimental design limitations rather than fundamental reasoning failures. Our analysis reveals three critical issues: (1) Tower of Hanoi experiments risk exceeding model output token limits, with models explicitly acknowledging these constraints in their outputs; (2) The authors' automated evaluation framework fails to distinguish between reasoning failures and practical constraints, leading to misclassification of model capabilities; (3) Most concerningly, their River Crossing benchmarks include mathematically impossible instances for N > 5 due to insufficient boat capacity, yet models are scored as failures for not solving these unsolvable problems. When we control for these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI-based Problem Solving and Planning · Constraint Satisfaction and Optimization · Bayesian Modeling and Causal Inference