Are Your Reasoning Models Reasoning or Guessing? A Mechanistic Analysis of Hierarchical Reasoning Models

Zirui Ren; Ziming Liu

arXiv:2601.10679·cs.AI·March 24, 2026

Are Your Reasoning Models Reasoning or Guessing? A Mechanistic Analysis of Hierarchical Reasoning Models

Zirui Ren, Ziming Liu

PDF

Open Access 1 Datasets

TL;DR

This paper critically examines hierarchical reasoning models, revealing they often guess rather than reason, and introduces strategies to improve their accuracy, notably boosting Sudoku-Extreme performance from 54.5% to 96.9%.

Contribution

It provides a mechanistic analysis of HRM, identifying failure modes and proposing methods to enhance reasoning accuracy through guess scaling strategies.

Findings

01

HRM can fail on simple puzzles due to fixed point violations.

02

HRM exhibits 'grokking' dynamics with critical reasoning steps.

03

Multiple fixed points cause HRM to guess and get trapped.

Abstract

Hierarchical reasoning model (HRM) achieves extraordinary performance on various reasoning tasks, significantly outperforming large language model-based reasoners. To understand the strengths and potential failure modes of HRM, we conduct a mechanistic study on its reasoning patterns and find three surprising facts: (a) Failure of extremely simple puzzles, e.g., HRM can fail on a puzzle with only one unknown cell. We attribute this failure to the violation of the fixed point property, a fundamental assumption of HRM. (b) "Grokking" dynamics in reasoning steps, i.e., the answer is not improved uniformly, but instead there is a critical reasoning step that suddenly makes the answer correct; (c) Existence of multiple fixed points. HRM "guesses" the first fixed point, which could be incorrect, and gets trapped there for a while or forever. All facts imply that HRM appears to be "guessing"…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ThomasHeim/HRM-dataset
dataset· 74 dl
74 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Intelligent Tutoring Systems and Adaptive Learning