Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering   Vectors

Weixuan Wang; Jingyuan Yang; Wei Peng

arXiv:2410.12299·cs.CL·February 26, 2025·3 cites

Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors

Weixuan Wang, Jingyuan Yang, Wei Peng

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces SADI, a dynamic, semantics-aware activation intervention method for LLMs that improves task performance by adaptively steering model behavior at inference time without retraining.

Contribution

SADI is the first to construct a dynamic, input semantics-dependent steering vector for activation intervention, enhancing adaptability and effectiveness over fixed methods.

Findings

01

SADI outperforms baseline intervention methods significantly.

02

SADI improves task performance without additional training.

03

SADI is applicable across various LLM architectures and tasks.

Abstract

Large language models (LLMs) have achieved remarkable performance across many tasks, yet aligning them with desired behaviors remains challenging. Activation intervention has emerged as an effective and economical method to modify the behavior of LLMs. Despite considerable interest in this area, current intervention methods exclusively employ a fixed steering vector to modify model activations, lacking adaptability to diverse input semantics. To address this limitation, we propose Semantics-Adaptive Dynamic Intervention (SADI), a novel method that constructs a dynamic steering vector to intervene model activations at inference time. More specifically, SADI utilizes activation differences in contrastive pairs to precisely identify critical elements of an LLM (i.e., attention heads, hidden states, and neurons) for targeted intervention. During inference, SADI dynamically steers model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

weixuan-wang123/SADI
pytorchOfficial

Videos

Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Intelligent Tutoring Systems and Adaptive Learning

MethodsSoftmax · Attention Is All You Need