Decomposing and Steering Functional Metacognition in Large Language Models

Yanshi Li; Xueru Bai; Shuman Liu; Haibo Zhang; Anxiang Zeng

arXiv:2605.08942·cs.CL·May 12, 2026

Decomposing and Steering Functional Metacognition in Large Language Models

Yanshi Li, Xueru Bai, Shuman Liu, Haibo Zhang, Anxiang Zeng

PDF

1 Repo

TL;DR

This paper uncovers that large language models possess decomposable internal metacognitive states that influence their reasoning, and demonstrates how steering these states can modulate model behavior.

Contribution

It introduces a framework to identify and causally manipulate internal metacognitive states in LLMs, revealing their impact on reasoning and evaluation.

Findings

01

Metacognitive states are linearly decodable from internal activations.

02

Steering activations along probe directions modulates reasoning behavior.

03

Benchmark performance is influenced by activation of specific internal states.

Abstract

Large language models (LLMs) increasingly exhibit behaviors suggesting awareness of their evaluation context, often adapting their reasoning strategies in benchmark settings. Prior work has shown that such evaluation awareness can distort performance measurements; however, it remains unclear whether this phenomenon reflects a single behavioral artifact or a deeper internal structure within the model. We propose that LLMs maintain a decomposable space of functional metacognitive states: internal variables encoding factors such as evaluation awareness, self-assessed capability, perceived risk, computational effort allocation, audience expertise adaptation, and intentionality. Through residual stream analysis across multiple reasoning models, we demonstrate that these states are linearly decodable from internal activations and exhibit distinct layer-wise profiles. Moreover, by steering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xlands/meta-cognition
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.