Enhancing Cross-Prompt Transferability in Vision-Language Models through   Contextual Injection of Target Tokens

Xikang Yang; Xuehai Tang; Fuqing Zhu; Jizhong Han; Songlin Hu

arXiv:2406.13294·cs.MM·June 21, 2024

Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target Tokens

Xikang Yang, Xuehai Tang, Fuqing Zhu, Jizhong Han, Songlin Hu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Contextual-Injection Attack (CIA) that uses gradient-based perturbations to embed target tokens into visual and textual data, significantly improving the transferability of adversarial images across prompts in vision-language models.

Contribution

The paper presents a novel CIA method that enhances cross-prompt transferability by shifting semantics towards target tokens, outperforming existing adversarial techniques.

Findings

01

CIA outperforms existing methods in transferability

02

Effective in diverse vision-language models like BLIP2, InstructBLIP, and LLaVA

03

Improves adversarial attack success across prompts

Abstract

Vision-language models (VLMs) seamlessly integrate visual and textual data to perform tasks such as image classification, caption generation, and visual question answering. However, adversarial images often struggle to deceive all prompts effectively in the context of cross-prompt migration attacks, as the probability distribution of the tokens in these images tends to favor the semantics of the original image rather than the target tokens. To address this challenge, we propose a Contextual-Injection Attack (CIA) that employs gradient-based perturbation to inject target tokens into both visual and textual contexts, thereby improving the probability distribution of the target tokens. By shifting the contextual semantics towards the target tokens instead of the original image semantics, CIA enhances the cross-prompt transferability of adversarial images.Extensive experiments on the BLIP2,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YancyKahn/CIA
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques