TL;DR
This paper reveals that a small set of activations in large language models largely controls long chain-of-thought reasoning, and introduces a training-free activation control method to efficiently elicit and enhance this ability.
Contribution
It uncovers the internal activation mechanisms behind long CoT reasoning and proposes a novel, training-free activation control technique to improve reasoning without costly fine-tuning.
Findings
Amplifying key activations boosts long CoT reasoning.
Activation dynamics follow predictable trajectories.
Training-free activation control improves reasoning performance.
Abstract
Despite the remarkable reasoning performance, eliciting the long chain-of-thought (CoT) ability in large language models (LLMs) typically requires costly reinforcement learning or supervised fine-tuning on high-quality distilled data. We investigate the internal mechanisms behind this capability and show that a small set of high-impact activations in the last few layers largely governs long-form reasoning attributes, such as output length and self-reflection. By simply amplifying these activations and inserting "wait" tokens, we can invoke the long CoT ability without any training, resulting in significantly increased self-reflection rates and accuracy. Moreover, we find that the activation dynamics follow predictable trajectories, with a sharp rise after special tokens and a subsequent exponential decay. Building on these insights, we introduce a general training-free activation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
MethodsSparse Evolutionary Training
