TL;DR
Spherical Steering introduces a geometry-aware activation rotation method for language models that enables effective inference-time control without retraining, outperforming traditional addition-based steering methods.
Contribution
The paper proposes a novel activation rotation technique along a geodesic for inference-time steering, preserving signal integrity and improving control accuracy.
Findings
Outperforms addition-based baselines by +10% on multiple benchmarks
Maintains open-ended generation quality while steering effectively
Demonstrates the importance of geometric consistency in activation manipulation
Abstract
Inference-time steering offers a promising way to control language models (LMs) without retraining. However, standard approaches typically rely on activation addition, which inevitably alters the hidden-state magnitudes raising concerns about representation collapse and degraded open-ended generation. In this work, we explore Spherical Steering, a training-free primitive that resolves this trade-off through activation rotation. Rather than shifting activations with a fixed vector, our method rotates them along a geodesic toward a target direction, preserving signal integrity while steering toward the target concept. To further enhance adaptivity, we incorporate a confidence gate that dynamically modulates steering strength based on input uncertainty. Extensive experiments across multiple-choice benchmarks demonstrate that Spherical Steering significantly outperforms addition-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education
