Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift
Siyuan Liang, Jiawei Liang, Tianyu Pang, Chao Du, Aishan Liu, Mingli, Zhu, Xiaochun Cao, Dacheng Tao

TL;DR
This paper investigates backdoor attacks on large vision-language models during instruction tuning, emphasizing domain shift challenges and proposing a new attack method that significantly improves attack success rates under domain mismatches.
Contribution
It introduces the concept of backdoor domain generalization, analyzes factors affecting attack robustness, and proposes MABA, a novel multimodal attribution backdoor attack method.
Findings
Backdoor generalizability improves with domain-independent triggers.
Guiding models to predict triggers enhances attack robustness.
MABA achieves a 97% success rate at 0.2% poisoning rate.
Abstract
Instruction tuning enhances large vision-language models (LVLMs) but increases their vulnerability to backdoor attacks due to their open design. Unlike prior studies in static settings, this paper explores backdoor attacks in LVLM instruction tuning across mismatched training and testing domains. We introduce a new evaluation dimension, backdoor domain generalization, to assess attack robustness under visual and text domain shifts. Our findings reveal two insights: (1) backdoor generalizability improves when distinctive trigger patterns are independent of specific data domains or model architectures, and (2) the competitive interaction between trigger patterns and clean semantic regions, where guiding the model to predict triggers enhances attack generalizability. Based on these insights, we propose a multimodal attribution backdoor attack (MABA) that injects domain-agnostic triggers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications
MethodsSparse Evolutionary Training · Focus
