Backdoor Attack on Vision Language Models with Stealthy Semantic Manipulation

Zhiyuan Zhong; Zhen Sun; Yepang Liu; Xinlei He; Guanhong Tao

arXiv:2506.07214·cs.CV·June 10, 2025

Backdoor Attack on Vision Language Models with Stealthy Semantic Manipulation

Zhiyuan Zhong, Zhen Sun, Yepang Liu, Xinlei He, Guanhong Tao

PDF

Open Access

TL;DR

This paper uncovers a novel semantic backdoor attack on vision-language models that manipulates cross-modal semantic alignment, demonstrating high attack success rates and resistance to defenses, highlighting critical security vulnerabilities.

Contribution

The paper introduces BadSem, a new semantic backdoor attack leveraging cross-modal mismatches, and constructs SIMBad dataset for evaluation, revealing vulnerabilities in VLMs.

Findings

01

Achieves over 98% attack success rate across models

02

Generalizes well to out-of-distribution datasets

03

Defenses based on prompts and fine-tuning are ineffective

Abstract

Vision Language Models (VLMs) have shown remarkable performance, but are also vulnerable to backdoor attacks whereby the adversary can manipulate the model's outputs through hidden triggers. Prior attacks primarily rely on single-modality triggers, leaving the crucial cross-modal fusion nature of VLMs largely unexplored. Unlike prior work, we identify a novel attack surface that leverages cross-modal semantic mismatches as implicit triggers. Based on this insight, we propose BadSem (Backdoor Attack with Semantic Manipulation), a data poisoning attack that injects stealthy backdoors by deliberately misaligning image-text pairs during training. To perform the attack, we construct SIMBad, a dataset tailored for semantic manipulation involving color and object attributes. Extensive experiments across four widely used VLMs show that BadSem achieves over 98% average ASR, generalizes well to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications

MethodsSoftmax · Attention Is All You Need · Focus