InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models
Xunguang Wang, Zhenlan Ji, Pingchuan Ma, Zongjie Li, Shuai Wang

TL;DR
This paper introduces InstructTA, a novel targeted attack method on large vision-language models that leverages instruction tuning and surrogate models to achieve high transferability without access to proprietary prompts or the underlying language model.
Contribution
We propose a new targeted attack framework, InstructTA, which enhances transferability by using instruction tuning and surrogate models based on the victim's vision encoder.
Findings
InstructTA achieves superior targeted attack success rates.
The method demonstrates high transferability across different LVLMs.
Instruction augmentation improves attack robustness.
Abstract
Large vision-language models (LVLMs) have demonstrated their incredible capability in image understanding and response generation. However, this rich visual interaction also makes LVLMs vulnerable to adversarial examples. In this paper, we formulate a novel and practical targeted attack scenario that the adversary can only know the vision encoder of the victim LVLM, without the knowledge of its prompts (which are often proprietary for service providers and not publicly available) and its underlying large language model (LLM). This practical setting poses challenges to the cross-prompt and cross-model transferability of targeted adversarial attack, which aims to confuse the LVLM to output a response that is semantically similar to the attacker's chosen target text. To this end, we propose an instruction-tuned targeted attack (dubbed \textsc{InstructTA}) to deliver the targeted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Digital Media Forensic Detection
Methodstravel james · Multi-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Dropout · Dense Connections · Byte Pair Encoding · Softmax · Layer Normalization
