VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion   Models

Sheng-Yen Chou; Pin-Yu Chen; Tsung-Yi Ho

arXiv:2306.06874·cs.CR·January 1, 2024·5 cites

VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models

Sheng-Yen Chou, Pin-Yu Chen, Tsung-Yi Ho

PDF

Open Access 1 Repo 1 Video

TL;DR

VillanDiffusion introduces a comprehensive backdoor attack framework applicable to various diffusion models, revealing vulnerabilities and enabling detailed analysis of output manipulation risks in generative AI systems.

Contribution

It provides the first unified framework for backdoor attacks across multiple diffusion model types, enhancing understanding of vulnerabilities in generative AI.

Findings

01

Framework covers multiple diffusion model types and samplers

02

Facilitates holistic backdoor vulnerability analysis

03

Reveals new insights into caption-based backdoor attacks

Abstract

Diffusion Models (DMs) are state-of-the-art generative models that learn a reversible corruption process from iterative noise addition and denoising. They are the backbone of many generative AI applications, such as text-to-image conditional generation. However, recent studies have shown that basic unconditional DMs (e.g., DDPM and DDIM) are vulnerable to backdoor injection, a type of output manipulation attack triggered by a maliciously embedded pattern at model input. This paper presents a unified backdoor attack framework (VillanDiffusion) to expand the current scope of backdoor analysis for DMs. Our framework covers mainstream unconditional and conditional DMs (denoising-based and score-based) and various training-free samplers for holistic evaluations. Experiments show that our unified framework facilitates the backdoor analysis of different DM configurations and provides new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ibm/villandiffusion
pytorchOfficial

Videos

VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models· slideslive

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Topic Modeling