Prompt-Activation Duality: Improving Activation Steering via Attention-Level Interventions

Diancheng Kang; Zheyuan Liu; Ningshan Ma; Yue Huang; Zhaoxuan Tan; Meng Jiang

arXiv:2605.10664·cs.CL·May 15, 2026

Prompt-Activation Duality: Improving Activation Steering via Attention-Level Interventions

Diancheng Kang, Zheyuan Liu, Ningshan Ma, Yue Huang, Zhaoxuan Tan, Meng Jiang

PDF

1 Repo

TL;DR

This paper introduces GCAD, a new method for activation steering in language models that improves long-term coherence and trait control by addressing cache contamination issues.

Contribution

The paper proposes GCAD, a novel attention-level intervention technique that enhances activation steering reliability in multi-turn dialogue settings.

Findings

01

GCAD significantly reduces coherence drift in multi-turn conversations.

02

GCAD improves trait expression at turn 10 from 78.0 to 93.1.

03

GCAD maintains trait control while enhancing long-horizon coherence.

Abstract

Activation steering controls language model behavior by adding directions to internal representations at inference time, but standard residual-stream steering can fail in stateful dialogue. We identify KV-cache contamination as a key failure mode: steered token states are stored and repeatedly reused, turning a local perturbation into cumulative coherence degradation. To address this challenge, we propose Gated Cropped Attention-Delta steering (GCAD), which extracts steering signals from system-prompt contributions to self-attention and applies them with token-level gating. Across persona-steering experiments, GCAD preserves trait control while substantially improving long-horizon coherence. On the main multi-turn benchmark, GCAD improves average coherence drift from -18.6 to -1.9 and raises turn-10 trait expression from 78.0 to 93.1. These results suggest that activation steering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nihii-obstat/Gated-Cropped-Attention-Delta
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.