Rethinking the Illusion of Thinking
I\~naki Dellibarda Varela, Pablo Romero-Sorozabal, Eduardo Rocon, Manuel Cebrian

TL;DR
This paper refines and replicates key benchmarks to clarify the reasoning capabilities of Large Reasoning Models, revealing their limitations and potential in complex symbolic tasks.
Contribution
It introduces incremental prompting and collaborative dialogue techniques to better evaluate LRM reasoning, challenging previous claims of their incapability.
Findings
LRMs struggle with moderately complex problems like 8-disk Towers of Hanoi.
LRMs can solve large, solvable instances of River Crossing with over 100 agent pairs.
Refined benchmarks provide a more accurate assessment of LRM reasoning abilities.
Abstract
Earlier this year, Apple ignited controversy by publishing "The Illusion of Thinking," prompting heated debate within the AI community. Critics seized upon the findings as conclusive evidence that Large Reasoning Models (LRMs) lack genuine reasoning capabilities, branding them as mere stochastic parrots. Meanwhile, defenders-spearheaded by Lawsen et al. (2025)-fired back, condemning the experimental setup as flawed and the conclusions overstated. We clarify this debate by replicating and refining two of the original study's most contentious benchmarks: Towers of Hanoi and River Crossing. By introducing incremental stepwise prompting and agentic collaborative dialogue, we show that previously reported failures solving the Towers of Hanoi were not purely result of output constraints, but also partly a result of cognition limitations: LRMs still stumble when complexity rises moderately…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Innovation, Sustainability, Human-Machine Systems · Embodied and Extended Cognition
