InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language   Models

Xunguang Wang; Zhenlan Ji; Pingchuan Ma; Zongjie Li; Shuai Wang

arXiv:2312.01886·cs.CV·June 27, 2024·2 cites

InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models

Xunguang Wang, Zhenlan Ji, Pingchuan Ma, Zongjie Li, Shuai Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces InstructTA, a novel targeted attack method on large vision-language models that leverages instruction tuning and surrogate models to achieve high transferability without access to proprietary prompts or the underlying language model.

Contribution

We propose a new targeted attack framework, InstructTA, which enhances transferability by using instruction tuning and surrogate models based on the victim's vision encoder.

Findings

01

InstructTA achieves superior targeted attack success rates.

02

The method demonstrates high transferability across different LVLMs.

03

Instruction augmentation improves attack robustness.

Abstract

Large vision-language models (LVLMs) have demonstrated their incredible capability in image understanding and response generation. However, this rich visual interaction also makes LVLMs vulnerable to adversarial examples. In this paper, we formulate a novel and practical targeted attack scenario that the adversary can only know the vision encoder of the victim LVLM, without the knowledge of its prompts (which are often proprietary for service providers and not publicly available) and its underlying large language model (LLM). This practical setting poses challenges to the cross-prompt and cross-model transferability of targeted adversarial attack, which aims to confuse the LVLM to output a response that is semantically similar to the attacker's chosen target text. To this end, we propose an instruction-tuned targeted attack (dubbed \textsc{InstructTA}) to deliver the targeted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xunguangwang/instructta
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Digital Media Forensic Detection

Methodstravel james · Multi-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Dropout · Dense Connections · Byte Pair Encoding · Softmax · Layer Normalization