Fine-Grained Activation Steering: Steering Less, Achieving More

Zijian Feng; Tianjiao Li; Zixiao Zhu; Hanzhang Zhou; Junlang Qian; Li Zhang; Jia Jim Deryl Chua; Lee Onn Mak; Gee Wah Ng; Kezhi Mao

arXiv:2602.04428·cs.CL·February 5, 2026

Fine-Grained Activation Steering: Steering Less, Achieving More

Zijian Feng, Tianjiao Li, Zixiao Zhu, Hanzhang Zhou, Junlang Qian, Li Zhang, Jia Jim Deryl Chua, Lee Onn Mak, Gee Wah Ng, Kezhi Mao

PDF

Open Access 3 Reviews

TL;DR

This paper introduces AUSteer, a fine-grained activation steering method for large language models that improves efficiency and effectiveness by targeting atomic units rather than entire blocks, leading to better control with less intervention.

Contribution

The paper reveals the heterogeneity of block activations and proposes a novel AU-level steering method, AUSteer, which enhances precision and reduces unnecessary modifications in LLM behavior.

Findings

01

AUSteer outperforms existing methods across multiple tasks and models.

02

Steering at the AU level requires fewer activations while achieving better control.

03

Heterogeneity in block activations causes coarse steering and is mitigated by fine-grained intervention.

Abstract

Activation steering has emerged as a cost-effective paradigm for modifying large language model (LLM) behaviors. Existing methods typically intervene at the block level, steering the bundled activations of selected attention heads, feedforward networks, or residual streams. However, we reveal that block-level activations are inherently heterogeneous, entangling beneficial, irrelevant, and harmful features, thereby rendering block-level steering coarse, inefficient, and intrusive. To investigate the root cause, we decompose block activations into fine-grained atomic unit (AU)-level activations, where each AU-level activation corresponds to a single dimension of the block activation, and each AU denotes a slice of the block weight matrix. Steering an AU-level activation is thus equivalent to steering its associated AU. Our theoretical and empirical analysis show that heterogeneity arises…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. They first use two sections to recognize and interpret the heterogeneity in block activation, which gives insight and inspiration for AUSteer. 2. The method is natural and effective. 3. The experiments are comprehensive, spanning three LLMs with different architectures and three different tasks.

Weaknesses

1. The biggest model used is 27B. Evaluating AUSteer on bigger models, e.g., 32B and 72B, and sparse models, e.g., MoE, even multi-modal models would be better. 2. The optimal hyperparameters \alpha and k are task-specific; how to set the hyperparameters for every tasks? And what is the hyperparameters used in Table. 1?

Reviewer 02Rating 4Confidence 4

Strengths

- Clearly identifies a fundamental issue: heterogeneity in block activations—and systematically decomposes it into atomic units. - Introduces the concept of activation momentum to measure discriminative importance without training. - Extensive experiments across three model families (LLaMA, Gemma, Qwen) and multiple tasks (reasoning, math, safety, alignment). - No retraining or fine-tuning required. - Ablation studies isolate the contribution of both components.

Weaknesses

- The formal derivation connecting activation momentum to discriminative causality is unclear. - AUSteer requires carefully curated positive–negative pairs, which may not be available or trivial to construct for all tasks. - While steering itself is efficient, computing activation momentum across many AUs and samples may still be computationally intensive for very large models. - Hyperparameter sensitivity is unclear and needs further demonstrations and explanations. - One wonders what is th

Reviewer 03Rating 4Confidence 4

Strengths

The problem is clearly defined and relevant. The idea of decomposing block activations into AUs is intuitive and well motivated. AUSteer is simple, interpretable, and does not require retraining. The experiments are broad and consistent across tasks and models, and the analysis convincingly shows heterogeneity within block activations.

Weaknesses

1) Efficiency claim lacks evidence: The paper’s argument that a smaller steering footprint improves efficiency is not empirically verified. No inference-time or computational measurements are provided, and efficiency is used only in a representational sense. 2) Lack of comparison with broader control variants. The paper assumes that steering only a subset of AUs is inherently superior, but does not test a broader or fully generalized steering scheme where all AUs are jointly optimized or se

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Materials Science · Natural Language Processing Techniques