Internalizing LLM Reasoning via Discovery and Replay of Latent Actions
Zhenning Shi, Yijia Zhu, Junhan Shi, Xun Zhang, Lei Wang, Congcong Miao

TL;DR
This paper introduces STIR, a dynamic latent trajectory control framework that internalizes reasoning in language models, leading to improved accuracy and efficiency in complex reasoning tasks by internalizing chain-of-thought processes.
Contribution
STIR reformulates reasoning enhancement as a dynamic control problem, enabling internalization of reasoning to improve performance and reduce token usage in language models.
Findings
Improves accuracy by up to 7.5% on benchmarks.
Reduces token consumption by up to 35%.
Demonstrates effectiveness across multiple models and tasks.
Abstract
The internalization of chain-of-thought processes into hidden states has emerged as a highly efficient paradigm for scaling test-time compute. However, existing activation steering methods rely on static control vectors that fail to adapt to the non-stationary evolution of complex reasoning tasks. To address this limitation, we propose STIR (Self-Distilled Tools for Internal Reasoning), a framework that reformulates reasoning enhancement as a dynamic latent trajectory control problem. STIR introduces a synergistic three-stage pipeline: (1) differential intrinsic action induction harvests latent reasoning successes to crystallize steering primitives; (2) sparse control basis construction curates a compact, geometrically diverse tool library; and (3) value-modulated trajectory intervention dynamically injects context-specific impulses via anchor-based gating. Extensive experiments on six…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Action Observation and Synchronization
