When Is Rank-1 Steering Cheap? Geometry, Granularity, and Budgeted Search
John T. Robertson, Jianing Zhu, Haris Vikalo, Zhangyang Wang

TL;DR
This paper investigates the geometry and difficulty of rank-1 activation steering in large language models, proposing geometry-guided search and a framework to improve control efficiency based on concept granularity.
Contribution
It introduces a geometry-guided search method to reduce trials in rank-1 steering and presents GRACE, a framework for diagnosing and addressing steering challenges using activation geometry.
Findings
Geometry-guided search reduces trials needed by 39.8% on average.
Concept granularity correlates with steering difficulty and convergence speed.
GRACE framework effectively diagnoses and improves steering stability.
Abstract
Activation steering offers a lightweight way to control LLMs without retraining, but its effectiveness varies sharply across concepts. Prior work often reads this variability as evidence that many concepts are not captured by a single steering direction. We argue instead that much of it reflects search difficulty: a useful rank-1 intervention often exists, but finding it can be expensive. We formalize rank-1 steering as a budget-constrained optimization over intervention layer and coefficient. Across concepts and model families, prompt-boundary directional alignment predicts where effective interventions occur, enabling geometry-guided search that reaches high utility with substantially fewer evaluations, reducing the trials needed to recover 95% of best-found utility by 39.8% on average across three model families. To explain why some concepts remain expensive even under better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
