Decoding the Critique Mechanism in Large Reasoning Models
Hoang Phan, Quang H. Nguyen, Hung T. Q. Le, Xiusi Chen, Heng Ji, Khoa D. Doan

TL;DR
This paper investigates how large reasoning models detect and correct their own errors during complex reasoning tasks, revealing a hidden critique mechanism that improves their self-verification and error recovery abilities.
Contribution
It introduces the concept of a critique vector representing the model's internal error detection mechanism and demonstrates how steering this vector enhances model self-correction without additional training.
Findings
Models can reach correct answers despite intermediate errors.
Steering latent representations improves error detection.
The critique vector is highly interpretable and effective.
Abstract
Large Reasoning Models (LRMs) exhibit backtracking and self-verification mechanisms that enable them to revise intermediate steps and reach correct solutions, yielding strong performance on complex logical benchmarks. We hypothesize that such behaviors are beneficial only when the model has sufficiently strong "critique" ability to detect its own mistakes. This work systematically investigates how current LRMs recover from errors by inserting arithmetic mistakes in their intermediate reasoning steps. Notably, we discover a peculiar yet important phenomenon: despite the error propagating through the chain-of-thought (CoT), resulting in an incorrect intermediate conclusion, the model still reaches the correct final answer. This recovery implies that the model must possess an internal mechanism to detect errors and trigger self-correction, which we refer to as the hidden critique ability.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks · Topic Modeling
