Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors
Weixuan Wang, Jingyuan Yang, Wei Peng

TL;DR
This paper introduces SADI, a dynamic, semantics-aware activation intervention method for LLMs that improves task performance by adaptively steering model behavior at inference time without retraining.
Contribution
SADI is the first to construct a dynamic, input semantics-dependent steering vector for activation intervention, enhancing adaptability and effectiveness over fixed methods.
Findings
SADI outperforms baseline intervention methods significantly.
SADI improves task performance without additional training.
SADI is applicable across various LLM architectures and tasks.
Abstract
Large language models (LLMs) have achieved remarkable performance across many tasks, yet aligning them with desired behaviors remains challenging. Activation intervention has emerged as an effective and economical method to modify the behavior of LLMs. Despite considerable interest in this area, current intervention methods exclusively employ a fixed steering vector to modify model activations, lacking adaptability to diverse input semantics. To address this limitation, we propose Semantics-Adaptive Dynamic Intervention (SADI), a novel method that constructs a dynamic steering vector to intervene model activations at inference time. More specifically, SADI utilizes activation differences in contrastive pairs to precisely identify critical elements of an LLM (i.e., attention heads, hidden states, and neurons) for targeted intervention. During inference, SADI dynamically steers model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Intelligent Tutoring Systems and Adaptive Learning
MethodsSoftmax · Attention Is All You Need
