Activation Steering for Chain-of-Thought Compression
Seyedarmin Azizi, Erfan Baghaei Potraghloo, Massoud Pedram

TL;DR
This paper introduces Activation-Steered Compression (ASC), a training-free method that reduces the length of chains of thought in large language models by shifting activation modes, leading to faster reasoning with minimal accuracy loss.
Contribution
We propose a novel inference-time technique, ASC, that compresses reasoning traces in LLMs by manipulating residual-stream activations without retraining.
Findings
Achieves up to 67.43% reduction in CoT length on benchmark datasets.
Delivers an average 2.73x speedup in reasoning time on an 8B model.
Maintains accuracy across multiple model sizes and datasets.
Abstract
Large language models (LLMs) excel at complex reasoning when they include intermediate steps, known as "chains of thought" (CoTs). However, these rationales are often overly verbose, even for simple problems, leading to wasted context, increased latency, and higher energy consumption. We observe that verbose, English-heavy CoTs and concise, math-centric CoTs occupy distinct regions in the model's residual-stream activation space. By extracting and injecting a "steering vector" to transition between these modes, we can reliably shift generation toward more concise reasoning, effectively compressing CoTs without retraining. We formalize this approach as Activation-Steered Compression (ASC), an inference-time technique that shortens reasoning traces by directly modifying hidden representations. In addition, we provide a theoretical analysis of the impact of ASC on the output distribution,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFunctional Brain Connectivity Studies · Mental Health Research Topics
