MTAttack: Multi-Target Backdoor Attacks against Large Vision-Language Models
Zihan Wang, Guansong Pang, Wenjun Miao, Jin Zheng, Xiao Bai

TL;DR
This paper introduces MTAttack, a novel framework for multi-target backdoor attacks on large vision-language models, demonstrating high success rates and robustness, thereby revealing significant security vulnerabilities.
Contribution
The work presents the first multi-target backdoor attack method for LVLMs, with a new optimization approach ensuring accurate multiple trigger-target mappings in latent space.
Findings
High success rate in multi-target attacks
Outperforms existing attack methods
Effective across datasets and resistant to defenses
Abstract
Recent advances in Large Visual Language Models (LVLMs) have demonstrated impressive performance across various vision-language tasks by leveraging large-scale image-text pretraining and instruction tuning. However, the security vulnerabilities of LVLMs have become increasingly concerning, particularly their susceptibility to backdoor attacks. Existing backdoor attacks focus on single-target attacks, i.e., targeting a single malicious output associated with a specific trigger. In this work, we uncover multi-target backdoor attacks, where multiple independent triggers corresponding to different attack targets are added in a single pass of training, posing a greater threat to LVLMs in real-world applications. Executing such attacks in LVLMs is challenging since there can be many incorrect trigger-target mappings due to severe feature interference among different triggers. To address this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
