VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models
Sheng-Yen Chou, Pin-Yu Chen, Tsung-Yi Ho

TL;DR
VillanDiffusion introduces a comprehensive backdoor attack framework applicable to various diffusion models, revealing vulnerabilities and enabling detailed analysis of output manipulation risks in generative AI systems.
Contribution
It provides the first unified framework for backdoor attacks across multiple diffusion model types, enhancing understanding of vulnerabilities in generative AI.
Findings
Framework covers multiple diffusion model types and samplers
Facilitates holistic backdoor vulnerability analysis
Reveals new insights into caption-based backdoor attacks
Abstract
Diffusion Models (DMs) are state-of-the-art generative models that learn a reversible corruption process from iterative noise addition and denoising. They are the backbone of many generative AI applications, such as text-to-image conditional generation. However, recent studies have shown that basic unconditional DMs (e.g., DDPM and DDIM) are vulnerable to backdoor injection, a type of output manipulation attack triggered by a maliciously embedded pattern at model input. This paper presents a unified backdoor attack framework (VillanDiffusion) to expand the current scope of backdoor analysis for DMs. Our framework covers mainstream unconditional and conditional DMs (denoising-based and score-based) and various training-free samplers for holistic evaluations. Experiments show that our unified framework facilitates the backdoor analysis of different DM configurations and provides new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Topic Modeling
