From Latent Signals to Reflection Behavior: Tracing Meta-Cognitive Activation Trajectory in R1-Style LLMs

Yanrui Du; Yibo Gao; Sendong Zhao; Jiayun Li; Haochun Wang; Qika Lin; Kai He; Bing Qin; Mengling Feng

arXiv:2602.01999·cs.CL·February 6, 2026

From Latent Signals to Reflection Behavior: Tracing Meta-Cognitive Activation Trajectory in R1-Style LLMs

Yanrui Du, Yibo Gao, Sendong Zhao, Jiayun Li, Haochun Wang, Qika Lin, Kai He, Bing Qin, Mengling Feng

PDF

Open Access

TL;DR

This paper investigates the internal layer-wise activation process of R1-style LLMs during self-reflection, revealing a structured progression from latent control to overt reflection, and demonstrates causal mechanisms through targeted interventions.

Contribution

It uncovers the layer-wise activation trajectory and causal chain underlying self-reflection in R1-style LLMs, providing insights into their meta-cognitive process.

Findings

01

Structured layer-wise activation progression identified

02

Causal chain linking prompt semantics to reflection tokens

03

Interventions demonstrate control over reflection behavior

Abstract

R1-style LLMs have attracted growing attention for their capacity for self-reflection, yet the internal mechanisms underlying such behavior remain unclear. To bridge this gap, we anchor on the onset of reflection behavior and trace its layer-wise activation trajectory. Using the logit lens to read out token-level semantics, we uncover a structured progression: (i) Latent-control layers, where an approximate linear direction encodes the semantics of thinking budget; (ii) Semantic-pivot layers, where discourse-level cues, including turning-point and summarization cues, surface and dominate the probability mass; and (iii) Behavior-overt layers, where the likelihood of reflection-behavior tokens begins to rise until they become highly likely to be sampled. Moreover, our targeted interventions uncover a causal chain across these stages: prompt-level semantics modulate the projection of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Intelligent Tutoring Systems and Adaptive Learning · Explainable Artificial Intelligence (XAI)