Global Evolutionary Steering: Refining Activation Steering Control via Cross-Layer Consistency
Xinyan Jiang, Wenjing Yu, Di Wang, Lijie Hu

TL;DR
This paper introduces GER-steer, a training-free activation control method for LLMs that improves robustness and generalization by leveraging the network's geometric stability, addressing noise and semantic drift issues.
Contribution
GER-steer is a novel, training-free framework that refines activation steering vectors using global geometric signals, enhancing reliability and universality in model alignment.
Findings
GER-steer outperforms existing methods in various evaluations
It effectively decouples semantic intent from artifacts
Demonstrates superior robustness without layer-specific tuning
Abstract
Activation engineering enables precise control over Large Language Models (LLMs) without the computational cost of fine-tuning. However, existing methods deriving vectors from static activation differences are susceptible to high-dimensional noise and layer-wise semantic drift, often capturing spurious correlations rather than the target intent. To address this, we propose Global Evolutionary Refined Steering (GER-steer), a training-free framework that grounded in the geometric stability of the network's representation evolution. GER-steer exploits this global signal to rectify raw steering vectors, effectively decoupling robust semantic intent from orthogonal artifacts. Extensive evaluations confirm that GER-steer consistently outperforms baselines, delivering superior efficacy and generalization without layer-specific tuning, establishing a universal solution for reliable model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
