Curveball Steering: The Right Direction To Steer Isn't Always Linear
Shivam Raval, Hae Jin Song, Linlin Wu, Abir Harrasse, Jeff M. Phillips, Fazl Barez, Amirali Abdullah

TL;DR
This paper challenges the linear assumption in activation steering for LLMs, proposing a nonlinear method called Curveball steering that better respects the intrinsic geometry of activation spaces, leading to improved control.
Contribution
It introduces Curveball steering, a nonlinear intervention technique based on polynomial kernel PCA, addressing geometric distortions in activation spaces for more effective LLM control.
Findings
Curveball steering outperforms linear PCA-based steering in distorted activation spaces.
Activation spaces exhibit significant geometric distortion, invalidating linear assumptions.
Nonlinear interventions better align with the intrinsic geometry of LLM activation spaces.
Abstract
Activation steering is a widely used approach for controlling large language model (LLM) behavior by intervening on internal representations. Existing methods largely rely on the Linear Representation Hypothesis, assuming behavioral attributes can be manipulated using global linear directions. In practice, however, such linear interventions often behave inconsistently. We question this assumption by analyzing the intrinsic geometry of LLM activation spaces. Measuring geometric distortion via the ratio of geodesic to Euclidean distances, we observe substantial and concept-dependent distortions, indicating that activation spaces are not well-approximated by a globally linear geometry. Motivated by this, we propose "Curveball steering", a nonlinear steering method based on polynomial kernel PCA that performs interventions in a feature space, better respecting the learned activation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
