Internalizing LLM Reasoning via Discovery and Replay of Latent Actions

Zhenning Shi; Yijia Zhu; Junhan Shi; Xun Zhang; Lei Wang; Congcong Miao

arXiv:2602.04925·cs.LG·February 6, 2026

Internalizing LLM Reasoning via Discovery and Replay of Latent Actions

Zhenning Shi, Yijia Zhu, Junhan Shi, Xun Zhang, Lei Wang, Congcong Miao

PDF

Open Access

TL;DR

This paper introduces STIR, a dynamic latent trajectory control framework that internalizes reasoning in language models, leading to improved accuracy and efficiency in complex reasoning tasks by internalizing chain-of-thought processes.

Contribution

STIR reformulates reasoning enhancement as a dynamic control problem, enabling internalization of reasoning to improve performance and reduce token usage in language models.

Findings

01

Improves accuracy by up to 7.5% on benchmarks.

02

Reduces token consumption by up to 35%.

03

Demonstrates effectiveness across multiple models and tasks.

Abstract

The internalization of chain-of-thought processes into hidden states has emerged as a highly efficient paradigm for scaling test-time compute. However, existing activation steering methods rely on static control vectors that fail to adapt to the non-stationary evolution of complex reasoning tasks. To address this limitation, we propose STIR (Self-Distilled Tools for Internal Reasoning), a framework that reformulates reasoning enhancement as a dynamic latent trajectory control problem. STIR introduces a synergistic three-stage pipeline: (1) differential intrinsic action induction harvests latent reasoning successes to crystallize steering primitives; (2) sparse control basis construction curates a compact, geometrically diverse tool library; and (3) value-modulated trajectory intervention dynamically injects context-specific impulses via anchor-based gating. Extensive experiments on six…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Action Observation and Synchronization