CBV: Clean-label Backdoor Attacks on Vision Language Models via Diffusion Models

Ji Guo; Xiaolong Qin; Cencen Liu; Jielei Wang; Jierun Chen; and Wenbo Jiang

arXiv:2605.02202·cs.AI·May 5, 2026

CBV: Clean-label Backdoor Attacks on Vision Language Models via Diffusion Models

Ji Guo, Xiaolong Qin, Cencen Liu, Jielei Wang, Jierun Chen, and Wenbo Jiang

PDF

TL;DR

This paper introduces CBV, a novel backdoor attack on vision-language models using diffusion models to generate natural, stealthy poisoned samples with high attack success rates, avoiding detection by traditional methods.

Contribution

We propose a diffusion model-based backdoor attack that creates natural poisoned samples guided by multimodal and GradCAM information, improving stealthiness and effectiveness.

Findings

01

Achieves over 80% attack success rate on multiple VLMs.

02

Generates natural poisoned samples that are hard to detect.

03

Maintains normal model functionality despite the attack.

Abstract

Vision-Language Models (VLMs) have achieved remarkable success in tasks such as image captioning and visual question answering (VQA). However, as their applications become increasingly widespread, recent studies have revealed that VLMs are vulnerable to backdoor attacks. Existing backdoor attacks on VLMs primarily rely on data poisoning by adding visual triggers and modifying text labels, where the induced image-text mismatch makes poisoned samples easy to detect. To address this limitation, we propose the Clean-Label Backdoor Attack on VLMs via Diffusion Models (CBV), which leverages diffusion models to generate natural poisoned examples via score matching. Specifically, CBV modifies the score during the reverse generation process of the diffusion model to guide the generation of poisoned samples that contain triggered image features. To further enhance the effectiveness of the attack,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.