PureDiffusion: Using Backdoor to Counter Backdoor in Generative Diffusion Models
Vu Tuan Truong, Long Bao Le

TL;DR
PureDiffusion is a novel framework that detects backdoor attacks in diffusion models by effectively inverting embedded triggers, outperforming existing defenses in fidelity and success rate, and sometimes surpassing original trigger effectiveness.
Contribution
It introduces PureDiffusion, a new method for detecting backdoor triggers in diffusion models by inverting them, enhancing defense capabilities against backdoor attacks.
Findings
PureDiffusion outperforms existing defense methods in fidelity and success rate.
Inverted triggers by PureDiffusion can sometimes have higher attack success than original triggers.
The framework effectively detects various backdoor trigger-target pairs.
Abstract
Diffusion models (DMs) are advanced deep learning models that achieved state-of-the-art capability on a wide range of generative tasks. However, recent studies have shown their vulnerability regarding backdoor attacks, in which backdoored DMs consistently generate a designated result (e.g., a harmful image) called backdoor target when the models' input contains a backdoor trigger. Although various backdoor techniques have been investigated to attack DMs, defense methods against these threats are still limited and underexplored, especially in inverting the backdoor trigger. In this paper, we introduce PureDiffusion, a novel backdoor defense framework that can efficiently detect backdoor attacks by inverting backdoor triggers embedded in DMs. Our extensive experiments on various trigger-target pairs show that PureDiffusion outperforms existing defense methods with a large gap in terms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedia Influence and Politics · Computational and Text Analysis Methods
