Attention Hijacking: Response Manipulation Across Queries in Vision-Language Models
Zhiqiang Wang, Dongrui Liu, Yan Li, Zonghao Ying, Wei Xue, Wenhan Luo, Yike Guo

TL;DR
This paper introduces Attention Hijacking, a novel adversarial attack that enhances the transferability of manipulated responses across diverse queries in vision-language models by stabilizing attention patterns.
Contribution
It proposes a new attack method that explicitly steers attention distributions to improve cross-query transferability in VLMs, addressing limitations of existing attacks.
Findings
Attention Hijacking significantly improves transferability across diverse queries.
The method reduces dependence on specific query wording for manipulated responses.
It extends effectively to multiple attack scenarios, revealing insights into attention stability.
Abstract
Existing adversarial attacks on vision-language models (VLMs) can steer model outputs toward attacker-specified target responses, but their effectiveness often degrades when the same perturbed input is paired with different textual queries. This paper studies cross-query response manipulation, where a single adversarial example is expected to remain effective across diverse user queries. We first analyze the limitations of existing attacks and find that successful transfer is closely associated with preserving an image-dominant attention pattern during response generation. Motivated by the observation, we propose \textbf{Attention Hijacking}, a novel adversarial attack that explicitly steers internal attention distributions toward a persistent image-dominant pattern. By amplifying the influence of visual tokens on target response tokens while suppressing the competing influence of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
