When Are Concepts Erased From Diffusion Models?

Kevin Lu; Nicky Kriplani; Rohit Gandikota; Minh Pham; David Bau; Chinmay Hegde; Niv Cohen

arXiv:2505.17013·cs.LG·November 10, 2025

When Are Concepts Erased From Diffusion Models?

Kevin Lu, Nicky Kriplani, Rohit Gandikota, Minh Pham, David Bau, Chinmay Hegde, Niv Cohen

PDF

Open Access 1 Repo

TL;DR

This paper investigates the effectiveness of concept erasure in diffusion models by proposing two mechanisms and introducing diverse probing techniques to evaluate whether the target concept is truly removed, highlighting the need for comprehensive assessment methods.

Contribution

The paper introduces two conceptual models for concept erasure in diffusion models and develops a suite of probing techniques to evaluate erasure robustness comprehensively.

Findings

01

Proposed two mechanisms: interference with guidance and likelihood reduction.

02

Developed diverse probing techniques for thorough evaluation.

03

Emphasized importance of robustness testing beyond adversarial inputs.

Abstract

In concept erasure, a model is modified to selectively prevent it from generating a target concept. Despite the rapid development of new methods, it remains unclear how thoroughly these approaches remove the target concept from the model. We begin by proposing two conceptual models for the erasure mechanism in diffusion models: (i) interfering with the model's internal guidance processes, and (ii) reducing the unconditional likelihood of generating the target concept, potentially removing it entirely. To assess whether a concept has been truly erased from the model, we introduce a comprehensive suite of independent probing techniques: supplying visual context, modifying the diffusion trajectory, applying classifier guidance, and analyzing the model's alternative generations that emerge in place of the erased concept. Our results shed light on the value of exploring concept erasure…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kevinlu4588/diffusionconcepterasure
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Domain Adaptation and Few-Shot Learning · Topic Modeling

MethodsDiffusion