Diff-Cleanse: Identifying and Mitigating Backdoor Attacks in Diffusion   Models

Jiang Hao; Xiao Jin; Hu Xiaoguang; Chen Tianyou; Zhao Jiajia

arXiv:2407.21316·cs.CR·August 23, 2024·1 cites

Diff-Cleanse: Identifying and Mitigating Backdoor Attacks in Diffusion Models

Jiang Hao, Xiao Jin, Hu Xiaoguang, Chen Tianyou, Zhao Jiajia

PDF

Open Access 1 Repo

TL;DR

Diff-Cleanse is a new two-stage framework that detects and removes backdoors in diffusion models, significantly improving security without harming model performance.

Contribution

It introduces a trigger inversion and structural pruning approach specifically designed for diffusion models, addressing a gap in backdoor defense methods.

Findings

01

Achieves nearly 100% detection accuracy.

02

Effectively mitigates backdoor impacts.

03

Preserves benign model performance.

Abstract

Diffusion models (DMs) are regarded as one of the most advanced generative models today, yet recent studies suggest that they are vulnerable to backdoor attacks, which establish hidden associations between particular input patterns and model behaviors, compromising model integrity by causing undesirable actions with manipulated inputs. This vulnerability poses substantial risks, including reputational damage to model owners and the dissemination of harmful content. To mitigate the threat of backdoor attacks, there have been some investigations on backdoor detection and model repair. However, previous work fails to reliably purify the models backdoored by state-of-the-art attack methods, rendering the field much underexplored. To bridge this gap, we introduce Diff-Cleanse, a novel two-stage backdoor defense framework specifically designed for DMs. The first stage employs a novel trigger…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shymuel/diff-cleanse
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection

MethodsPruning