Steer Like the LLM: Activation Steering that Mimics Prompting

Geert Heyman; Frederik Vandeputte

arXiv:2605.03907·cs.CL·May 6, 2026

Steer Like the LLM: Activation Steering that Mimics Prompting

Geert Heyman, Frederik Vandeputte

PDF

TL;DR

This paper introduces Prompt Steering Replacement (PSR), a new activation steering method that mimics prompt-based steering by estimating token-specific coefficients, leading to improved control over language models.

Contribution

The paper formulates prompt steering as activation steering, revealing limitations of existing methods, and proposes PSR models that outperform current activation steering techniques.

Findings

01

PSR models outperform existing activation steering methods.

02

PSR achieves comparable or better results than prompting on benchmarks.

03

Activation steering methods often do not faithfully replicate prompt mechanics.

Abstract

Large language models can be steered at inference time through prompting or activation interventions, but activation steering methods often underperform compared to prompt-based approaches. We propose a framework that formulates prompt steering as a form of activation steering and investigates whether distilling successful prompt steering behavior into simpler, interpretable models can close this gap. Our analysis reveals that popular activation steering methods are not faithful to the mechanics of prompt steering, which applies strong interventions on some tokens while barely affecting others. Based on these insights, we introduce Prompt Steering Replacement (PSR) models that estimate token-specific steering coefficients from the activations themselves and are trained to imitate prompt-based interventions. Experiments on three steering benchmarks across multiple language models show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.