Contextual Linear Activation Steering of Language Models
Brandon Hsu, Daniel Beaglehole, Adityanarayanan Radhakrishnan, Mikhail Belkin

TL;DR
The paper introduces Contextual Linear Activation Steering (CLAS), a novel method that dynamically adjusts steering strength based on context, improving language model customization with limited labeled data.
Contribution
CLAS is a scalable, interpretable, and more effective method for steering large language models by adapting activation steering to context, outperforming fixed approaches.
Findings
CLAS outperforms standard linear activation steering across eleven benchmarks.
CLAS matches or exceeds ReFT and LoRA performance with limited data.
CLAS is scalable and interpretable for model specialization.
Abstract
Linear activation steering is a powerful approach for eliciting the capabilities of large language models and specializing their behavior using limited labeled data. While effective, existing methods often apply a fixed steering strength to all tokens, resulting in inconsistent steering quality across diverse input prompts. In this work, we introduce Contextual Linear Activation Steering (CLAS), a method that dynamically adapts linear activation steering to context-dependent steering strengths. Across eleven steering benchmarks and four model families, it consistently outperforms standard linear activation steering and matches or exceeds the performance of ReFT and LoRA in settings with limited labeled data. We therefore propose CLAS as a scalable, interpretable, and accurate method for specializing and steering large language models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
