Activation Control for Efficiently Eliciting Long Chain-of-thought Ability of Language Models

Zekai Zhao; Qi Liu; Kun Zhou; Zihan Liu; Yifei Shao; Zhiting Hu; Biwei Huang

arXiv:2505.17697·cs.CL·May 26, 2025

Activation Control for Efficiently Eliciting Long Chain-of-thought Ability of Language Models

Zekai Zhao, Qi Liu, Kun Zhou, Zihan Liu, Yifei Shao, Zhiting Hu, Biwei Huang

PDF

1 Video

TL;DR

This paper reveals that a small set of activations in large language models largely controls long chain-of-thought reasoning, and introduces a training-free activation control method to efficiently elicit and enhance this ability.

Contribution

It uncovers the internal activation mechanisms behind long CoT reasoning and proposes a novel, training-free activation control technique to improve reasoning without costly fine-tuning.

Findings

01

Amplifying key activations boosts long CoT reasoning.

02

Activation dynamics follow predictable trajectories.

03

Training-free activation control improves reasoning performance.

Abstract

Despite the remarkable reasoning performance, eliciting the long chain-of-thought (CoT) ability in large language models (LLMs) typically requires costly reinforcement learning or supervised fine-tuning on high-quality distilled data. We investigate the internal mechanisms behind this capability and show that a small set of high-impact activations in the last few layers largely governs long-form reasoning attributes, such as output length and self-reflection. By simply amplifying these activations and inserting "wait" tokens, we can invoke the long CoT ability without any training, resulting in significantly increased self-reflection rates and accuracy. Moreover, we find that the activation dynamics follow predictable trajectories, with a sharp rise after special tokens and a subsequent exponential decay. Building on these insights, we introduce a general training-free activation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Activation Control for Efficiently Eliciting Long Chain-of-thought Ability of Language Models· slideslive

Taxonomy

MethodsSparse Evolutionary Training