T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models

Zhongqi Wang; Jie Zhang; Shiguang Shan; Xilin Chen

arXiv:2407.04215·cs.CV·July 18, 2024·1 cites

T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models

Zhongqi Wang, Jie Zhang, Shiguang Shan, Xilin Chen

PDF

Open Access 1 Repo

TL;DR

T2IShield is a novel defense framework that detects, localizes, and mitigates backdoor attacks in text-to-image diffusion models by analyzing cross-attention maps and employing specific detection and localization techniques.

Contribution

It introduces the Assimilation Phenomenon in cross-attention maps and develops new detection and localization methods for backdoor attacks in diffusion models.

Findings

01

Detection F1 score of 88.9% for backdoor samples

02

Localization F1 score of 86.4% for triggers

03

99% of poisoned samples are invalidated

Abstract

While text-to-image diffusion models demonstrate impressive generation capabilities, they also exhibit vulnerability to backdoor attacks, which involve the manipulation of model outputs through malicious triggers. In this paper, for the first time, we propose a comprehensive defense method named T2IShield to detect, localize, and mitigate such attacks. Specifically, we find the "Assimilation Phenomenon" on the cross-attention maps caused by the backdoor trigger. Based on this key insight, we propose two effective backdoor detection methods: Frobenius Norm Threshold Truncation and Covariance Discriminant Analysis. Besides, we introduce a binary-search approach to localize the trigger within a backdoor sample and assess the efficacy of existing concept editing methods in mitigating backdoor attacks. Empirical evaluations on two advanced backdoor attack scenarios show the effectiveness of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

robin-wzq/t2ishield
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComparative and International Law Studies · Credit Risk and Financial Regulations

MethodsDiffusion