Understanding and Steering the Cognitive Behaviors of Reasoning Models at Test-Time

Zhenyu Zhang; Xiaoxia Wu; Zhongzhu Zhou; Qingyang Wu; Yineng Zhang; Pragaash Ponnusamy; Harikaran Subbaraj; Jue Wang; Shuaiwen Leon Song; Ben Athiwaratkun

arXiv:2512.24574·cs.CL·January 21, 2026

Understanding and Steering the Cognitive Behaviors of Reasoning Models at Test-Time

Zhenyu Zhang, Xiaoxia Wu, Zhongzhu Zhou, Qingyang Wu, Yineng Zhang, Pragaash Ponnusamy, Harikaran Subbaraj, Jue Wang, Shuaiwen Leon Song, Ben Athiwaratkun

PDF

Open Access 3 Reviews

TL;DR

This paper introduces CREST, a training-free method that dynamically steers reasoning trajectories of large language models at test-time, improving accuracy and efficiency by suppressing unproductive reasoning behaviors.

Contribution

The work uncovers specialized attention heads linked to reasoning behaviors and proposes CREST, a novel test-time intervention technique for improving LLM reasoning without additional training.

Findings

01

CREST improves reasoning accuracy by up to 17.5%.

02

CREST reduces token usage by 37.6%.

03

CREST enhances reasoning efficiency across various benchmarks.

Abstract

Large Language Models (LLMs) often rely on long chain-of-thought (CoT) reasoning to solve complex tasks. While effective, these trajectories are frequently inefficient, leading to high latency from excessive token generation, or unstable reasoning that alternates between underthinking (shallow, inconsistent steps) and overthinking (repetitive, verbose reasoning). In this work, we study the structure of reasoning trajectories and uncover specialized attention heads that correlate with distinct cognitive behaviors such as verification and backtracking. By lightly intervening on these heads at inference time, we can steer the model away from inefficient modes. Building on this insight, we propose CREST, a training-free method for Cognitive REasoning Steering at Test-time. CREST has two components: (1) an offline calibration step that identifies cognitive heads and derives head-specific…

Peer Reviews

Decision·ICLR 2026 Conference Desk Rejected Submission

Reviewer 01Rating 4Confidence 4

Strengths

1. The paper offers convincing empirical evidence that internal attention heads of LLMs encode semantically meaningful cognitive behaviors such as non-linear reasoning, supported by systematic linear probing. 2. The proposed CREST framework is lightweight and effective, requiring only offline calibration on small datasets and operating at inference time with negligible overhead. It is training-free and model-agnostic. 3. The results span a wide range of tasks and model scales, demonstrating stro

Weaknesses

1. The annotation of cognitive behaviors is rather coarse. Classifying linear and non-linear reasoning through surface keywords (e.g., “Wait”, “Alternatively”) may introduce noise. 2. Although the vector adjustments in Equations (4) and (5) are reasonable, there is a lack of intuitive explanation for their geometric or semantic impacts, making it difficult for readers to understand how such rotations affect reasoning patterns. 3. It’s unclear when steering might reduce accuracy by skipping key r

Reviewer 02Rating 6Confidence 4

Strengths

1. The paper comprehensively demonstrates, both quantitatively (see Figure 1, Figure 8-12) and qualitatively, that certain attention heads are strongly predictive of cognitive behaviors such as linear vs. non-linear reasoning steps, a finding that deepens model interpretability. 2. CREST requires only offline calibration (linear probing plus simple covariance projection), imposing negligible runtime overhead and no further model training. Intervention is mathematically straightforward (see Equat

Weaknesses

1. While the probing accuracies and covariance analyses are compelling, the theoretical motivation for why cognitive heads manifest at specific layers and how robust these interventions are (especially in models with cross-layer dependencies or strong MoE gating) remains thin. Theoretical claims about head specialization rely mainly on correlational evidence (Figure 1, Figure 8-12) instead of providing more causal or mechanistic arguments. Section 4.1.2 describes projecting to a low-rank subspac

Reviewer 03Rating 4Confidence 3

Strengths

S1. The proposed method is effective. It achieves consistent accuracy gains (up to +17.5%) while reducing generation length (up to −37.6%) on several reasoning benchmarks and models. S2. The discovery of “cognitive heads,” the probing analysis, and activation visualizations provide interpretable insights into internal reasoning mechanisms. S3. CREST requires no retraining or gradient updates, can be applied across many LLMs, and introduces negligible inference overhead.

Weaknesses

**W1**. The update $\hat{x_{i,j}}=x_{i,j}-\alpha v_{i,j}$ and its norm-preserving variant lack theoretical justification. It remains unclear why subtracting or orthogonalizing along an empirical mean direction improves reasoning, or how this compares to simpler scaling methods. In addition, the novelty of this intervention is unclear: please position it against existing activation/attention-head steering techniques (e.g., activation engineering / latent-direction editing, manifold steering for o

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)