CBV: Clean-label Backdoor Attacks on Vision Language Models via Diffusion Models
Ji Guo, Xiaolong Qin, Cencen Liu, Jielei Wang, Jierun Chen, and Wenbo Jiang

TL;DR
This paper introduces CBV, a novel backdoor attack on vision-language models using diffusion models to generate natural, stealthy poisoned samples with high attack success rates, avoiding detection by traditional methods.
Contribution
We propose a diffusion model-based backdoor attack that creates natural poisoned samples guided by multimodal and GradCAM information, improving stealthiness and effectiveness.
Findings
Achieves over 80% attack success rate on multiple VLMs.
Generates natural poisoned samples that are hard to detect.
Maintains normal model functionality despite the attack.
Abstract
Vision-Language Models (VLMs) have achieved remarkable success in tasks such as image captioning and visual question answering (VQA). However, as their applications become increasingly widespread, recent studies have revealed that VLMs are vulnerable to backdoor attacks. Existing backdoor attacks on VLMs primarily rely on data poisoning by adding visual triggers and modifying text labels, where the induced image-text mismatch makes poisoned samples easy to detect. To address this limitation, we propose the Clean-Label Backdoor Attack on VLMs via Diffusion Models (CBV), which leverages diffusion models to generate natural poisoned examples via score matching. Specifically, CBV modifies the score during the reverse generation process of the diffusion model to guide the generation of poisoned samples that contain triggered image features. To further enhance the effectiveness of the attack,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
