Enhancing Targeted Adversarial Attacks on Large Vision-Language Models via Intermediate Projector
Yiming Cao, Yanjie Li, Kaisheng Liang, Bin Xiao

TL;DR
This paper introduces a novel black-box targeted attack framework on large vision-language models that leverages the projector and fine-grained query outputs to improve attack success rates and granularity, with strong transferability.
Contribution
It proposes the Intermediate Projector Guided Attack (IPGA) and Residual Query Alignment (RQA) modules, enhancing attack effectiveness and transferability by exploiting the Q-Former and preserving content.
Findings
IPGA outperforms baselines in global targeted attacks.
IPGA-R achieves higher success rates and content preservation in fine-grained attacks.
Effective transferability to commercial VLMs like Google Gemini and OpenAI GPT.
Abstract
The growing deployment of Large Vision-Language Models (VLMs) raises safety concerns, as adversaries may exploit model vulnerabilities to induce harmful outputs, with targeted black-box adversarial attacks posing a particularly severe threat. However, existing methods primarily maximize encoder-level global similarity, which lacks the granularity for stealthy and practical fine-grained attacks, where only specific target should be altered (e.g., modifying a car while preserving its background). Moreover, they largely neglect the projector, a key semantic bridge in VLMs for multimodal alignment. To address these limitations, we propose a novel black-box targeted attack framework that leverages the projector. Specifically, we utilize the widely adopted Querying Transformer (Q-Former) which transforms global image embeddings into fine-grained query outputs, to enhance attack effectiveness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
