PureDiffusion: Using Backdoor to Counter Backdoor in Generative   Diffusion Models

Vu Tuan Truong; Long Bao Le

arXiv:2409.13945·cs.AI·September 24, 2024

PureDiffusion: Using Backdoor to Counter Backdoor in Generative Diffusion Models

Vu Tuan Truong, Long Bao Le

PDF

Open Access

TL;DR

PureDiffusion is a novel framework that detects backdoor attacks in diffusion models by effectively inverting embedded triggers, outperforming existing defenses in fidelity and success rate, and sometimes surpassing original trigger effectiveness.

Contribution

It introduces PureDiffusion, a new method for detecting backdoor triggers in diffusion models by inverting them, enhancing defense capabilities against backdoor attacks.

Findings

01

PureDiffusion outperforms existing defense methods in fidelity and success rate.

02

Inverted triggers by PureDiffusion can sometimes have higher attack success than original triggers.

03

The framework effectively detects various backdoor trigger-target pairs.

Abstract

Diffusion models (DMs) are advanced deep learning models that achieved state-of-the-art capability on a wide range of generative tasks. However, recent studies have shown their vulnerability regarding backdoor attacks, in which backdoored DMs consistently generate a designated result (e.g., a harmful image) called backdoor target when the models' input contains a backdoor trigger. Although various backdoor techniques have been investigated to attack DMs, defense methods against these threats are still limited and underexplored, especially in inverting the backdoor trigger. In this paper, we introduce PureDiffusion, a novel backdoor defense framework that can efficiently detect backdoor attacks by inverting backdoor triggers embedded in DMs. Our extensive experiments on various trigger-target pairs show that PureDiffusion outperforms existing defense methods with a large gap in terms of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedia Influence and Politics · Computational and Text Analysis Methods