Erasing Undesirable Concepts in Diffusion Models with Adversarial Preservation

Anh Bui; Long Vuong; Khanh Doan; Trung Le; Paul Montague; Tamas Abraham; Dinh Phung

arXiv:2410.15618·cs.LG·May 26, 2025

Erasing Undesirable Concepts in Diffusion Models with Adversarial Preservation

Anh Bui, Long Vuong, Khanh Doan, Trung Le, Paul Montague, Tamas Abraham, Dinh Phung

PDF

Open Access 2 Repos

TL;DR

This paper introduces a novel method for erasing undesirable concepts from diffusion models by focusing on adversarial concepts, achieving effective removal of harmful content while preserving unrelated concepts.

Contribution

The proposed approach identifies and preserves adversarial concepts most affected by parameter changes, improving erasure stability and model integrity.

Findings

01

Outperforms state-of-the-art erasure methods

02

Effectively eliminates unwanted content

03

Maintains unrelated content integrity

Abstract

Diffusion models excel at generating visually striking content from text but can inadvertently produce undesirable or harmful content when trained on unfiltered internet data. A practical solution is to selectively removing target concepts from the model, but this may impact the remaining concepts. Prior approaches have tried to balance this by introducing a loss term to preserve neutral content or a regularization term to minimize changes in the model parameters, yet resolving this trade-off remains challenging. In this work, we propose to identify and preserving concepts most affected by parameter changes, termed as \textit{adversarial concepts}. This approach ensures stable erasure with minimal impact on the other concepts. We demonstrate the effectiveness of our method using the Stable Diffusion model, showing that it outperforms state-of-the-art erasure methods in eliminating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsDiffusion