Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention

Zehao Jin; Ruixuan Deng; Junran Wang; Xinjie Shen; Chao Zhang

arXiv:2605.05892·cs.CL·May 8, 2026

Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention

Zehao Jin, Ruixuan Deng, Junran Wang, Xinjie Shen, Chao Zhang

PDF

2 Models

TL;DR

This paper introduces FLAS, a flow-based activation steering method that learns to modify language model activations at inference time, outperforming prompting on unseen concepts without per-concept tuning.

Contribution

FLAS is the first learned, concept-conditioned flow method for activation steering that surpasses prompting and challenges previous assumptions about activation space geometry.

Findings

01

FLAS outperforms prompting on AxBench benchmarks.

02

FLAS achieves harmonic means of 1.015 and 1.113 on Gemma-2-2B-IT and Gemma-2-9B-IT.

03

Learned flows reveal curved, multi-step, token-varying activation trajectories.

Abstract

Activation steering has emerged as a promising alternative for controlling language-model behavior at inference time by modifying intermediate representations while keeping model parameters frozen. However, large-scale evaluations such as AxBench show that existing steering methods are often outperformed by simple in-context prompting and generalize poorly to unseen concepts. We hypothesize that these limitations arise from unvalidated simplifying assumptions shared across prior methods, which typically restrict steering interventions to fixed, single-step, position-invariant transforms. We propose FLAS (Flow-based Activation Steering), which learns a general, concept-conditioned velocity field $v_{t} (h, t, c)$ that transports unsteered activations to steered ones without relying on these assumptions. On AxBench, FLAS is the first learned method to consistently outperform prompting,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.