Angular Steering: Behavior Control via Rotation in Activation Space
Hieu M. Vu, Tan M. Nguyen

TL;DR
Angular Steering introduces a geometric rotation-based method for precise and stable behavior control in large language models, improving upon existing subspace techniques by offering continuous, fine-grained adjustments without compromising overall performance.
Contribution
The paper presents Angular Steering, a novel rotation-based behavior control method that generalizes prior techniques, providing enhanced stability, flexibility, and fine-grained control in large language models.
Findings
Achieves robust behavioral control across multiple models and sizes.
Maintains language modeling performance while controlling behaviors.
Generalizes existing methods under a unified geometric rotation framework.
Abstract
Controlling specific behaviors in large language models while preserving their general capabilities is a central challenge for safe and reliable artificial intelligence deployment. Current steering methods, such as vector addition and directional ablation, are constrained within a two-dimensional subspace defined by the activation and feature direction, making them sensitive to chosen parameters and potentially affecting unrelated features due to unintended interactions in activation space. We introduce Angular Steering, a novel and flexible method for behavior modulation that operates by rotating activations within a fixed two-dimensional subspace. By formulating steering as a geometric rotation toward or away from a target behavior direction, Angular Steering provides continuous, fine-grained control over behaviors such as refusal and compliance. We demonstrate this method using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
