Pruning for Robust Concept Erasing in Diffusion Models

Tianyun Yang; Juan Cao; Chang Xu

arXiv:2405.16534·cs.CV·May 28, 2024·2 cites

Pruning for Robust Concept Erasing in Diffusion Models

Tianyun Yang, Juan Cao, Chang Xu

PDF

Open Access

TL;DR

This paper proposes a pruning-based method to enhance the robustness of concept erasing in diffusion models, significantly reducing the reproduction of undesirable outputs like NSFW content and copyrighted artworks under adversarial prompts.

Contribution

It introduces a novel pruning strategy that selectively removes concept-related neurons, improving robustness against adversarial attacks compared to existing fine-tuning methods.

Findings

01

Nearly 40% improvement in erasing NSFW content

02

30% enhancement in removing artwork style

03

Significant robustness against adversarial prompts

Abstract

Despite the impressive capabilities of generating images, text-to-image diffusion models are susceptible to producing undesirable outputs such as NSFW content and copyrighted artworks. To address this issue, recent studies have focused on fine-tuning model parameters to erase problematic concepts. However, existing methods exhibit a major flaw in robustness, as fine-tuned models often reproduce the undesirable outputs when faced with cleverly crafted prompts. This reveals a fundamental limitation in the current approaches and may raise risks for the deployment of diffusion models in the open world. To address this gap, we locate the concept-correlated neurons and find that these neurons show high sensitivity to adversarial prompts, thus could be deactivated when erasing and reactivated again under attacks. To improve the robustness, we introduce a new pruning-based strategy for concept…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference

MethodsDiffusion