UFID: A Unified Framework for Input-level Backdoor Detection on Diffusion Models
Zihan Guan, Mengxuan Hu, Sheng Li, Anil Vullikanti

TL;DR
This paper introduces UFID, a black-box input-level backdoor detection framework for diffusion models, addressing unique challenges in generative tasks and demonstrating high effectiveness and efficiency through extensive experiments.
Contribution
The paper presents a novel causal analysis-based framework for detecting backdoors in diffusion models, specifically designed for black-box inference scenarios.
Findings
High detection accuracy across multiple datasets
Effective in both conditional and unconditional diffusion models
Fast run-time performance
Abstract
Diffusion models are vulnerable to backdoor attacks, where malicious attackers inject backdoors by poisoning certain training samples during the training stage. This poses a significant threat to real-world applications in the Model-as-a-Service (MaaS) scenario, where users query diffusion models through APIs or directly download them from the internet. To mitigate the threat of backdoor attacks under MaaS, black-box input-level backdoor detection has drawn recent interest, where defenders aim to build a firewall that filters out backdoor samples in the inference stage, with access only to input queries and the generated results from diffusion models. Despite some preliminary explorations on the traditional classification tasks, these methods cannot be directly applied to the generative tasks due to two major challenges: (1) more diverse failures and (2) a multi-modality attack surface.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks
MethodsFocus · Diffusion
