VL-Trojan: Multimodal Instruction Backdoor Attacks against   Autoregressive Visual Language Models

Jiawei Liang; Siyuan Liang; Man Luo; Aishan Liu; Dongchen Han,; Ee-Chien Chang; Xiaochun Cao

arXiv:2402.13851·cs.CV·February 22, 2024·3 cites

VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models

Jiawei Liang, Siyuan Liang, Man Luo, Aishan Liu, Dongchen Han,, Ee-Chien Chang, Xiaochun Cao

PDF

Open Access

TL;DR

This paper introduces VL-Trojan, a novel multimodal instruction backdoor attack on autoregressive visual language models, demonstrating its effectiveness and robustness in manipulating model outputs during inference.

Contribution

The paper presents VL-Trojan, a new backdoor attack method that overcomes visual encoder constraints and black-box access limitations, significantly improving attack success rates.

Findings

01

Achieves +62.52% ASR over baselines

02

Effective across different model scales

03

Robust in few-shot reasoning scenarios

Abstract

Autoregressive Visual Language Models (VLMs) showcase impressive few-shot learning capabilities in a multimodal context. Recently, multimodal instruction tuning has been proposed to further enhance instruction-following abilities. However, we uncover the potential threat posed by backdoor attacks on autoregressive VLMs during instruction tuning. Adversaries can implant a backdoor by injecting poisoned samples with triggers embedded in instructions or images, enabling malicious manipulation of the victim model's predictions with predefined triggers. Nevertheless, the frozen visual encoder in autoregressive VLMs imposes constraints on the learning of conventional image triggers. Additionally, adversaries may encounter restrictions in accessing the parameters and architectures of the victim model. To address these challenges, we propose a multimodal instruction backdoor attack, namely…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Hate Speech and Cyberbullying Detection