A Dual-Purpose Framework for Backdoor Defense and Backdoor Amplification in Diffusion Models
Vu Tuan Truong, Long Bao Le

TL;DR
PureDiffusion is a novel framework that enhances backdoor detection in diffusion models and can also amplify backdoor attacks, significantly improving detection accuracy and attack success rates while reducing training time.
Contribution
It introduces a dual-purpose framework with new loss functions for trigger inversion, enabling effective backdoor detection and attack amplification in diffusion models.
Findings
Achieves near-perfect backdoor detection accuracy.
Boosts attack success rate to nearly 100%.
Reduces backdoor training time by up to 20 times.
Abstract
Diffusion models have emerged as state-of-the-art generative frameworks, excelling in producing high-quality multi-modal samples. However, recent studies have revealed their vulnerability to backdoor attacks, where backdoored models generate specific, undesirable outputs called backdoor target (e.g., harmful images) when a pre-defined trigger is embedded to their inputs. In this paper, we propose PureDiffusion, a dual-purpose framework that simultaneously serves two contrasting roles: backdoor defense and backdoor attack amplification. For defense, we introduce two novel loss functions to invert backdoor triggers embedded in diffusion models. The first leverages trigger-induced distribution shifts across multiple timesteps of the diffusion process, while the second exploits the denoising consistency effect when a backdoor is activated. Once an accurate trigger inversion is achieved, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Generative Adversarial Networks and Image Synthesis
MethodsDiffusion
