Unveiling the Latent Directions of Reflection in Large Language Models

Fu-Chieh Chang; Yu-Ting Lee; Pei-Yuan Wu

arXiv:2508.16989·cs.LG·December 12, 2025

Unveiling the Latent Directions of Reflection in Large Language Models

Fu-Chieh Chang, Yu-Ting Lee, Pei-Yuan Wu

PDF

TL;DR

This paper explores the internal mechanisms of reflection in large language models by analyzing activation directions, demonstrating how reflective behavior can be systematically identified and controlled through activation steering techniques.

Contribution

It introduces a novel activation steering methodology to characterize and manipulate reflection levels in LLMs, advancing understanding of their internal reflective processes.

Findings

01

Reflection can be systematically induced or suppressed via activation interventions.

02

Steering vectors effectively differentiate reflection levels in model activations.

03

Suppressing reflection is easier than stimulating it in LLMs.

Abstract

Reflection, the ability of large language models (LLMs) to evaluate and revise their own reasoning, has been widely used to improve performance on complex reasoning tasks. Yet, most prior works emphasizes designing reflective prompting strategies or reinforcement learning objectives, leaving the inner mechanisms of reflection underexplored. In this paper, we investigate reflection through the lens of latent directions in model activations. We propose a methodology based on activation steering to characterize how instructions with different reflective intentions: no reflection, intrinsic reflection, and triggered reflection. By constructing steering vectors between these reflection levels, we demonstrate that (1) new reflection-inducing instructions can be systematically identified, (2) reflective behavior can be directly enhanced or suppressed through activation interventions, and (3)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.