When Are Concepts Erased From Diffusion Models?
Kevin Lu, Nicky Kriplani, Rohit Gandikota, Minh Pham, David Bau, Chinmay Hegde, Niv Cohen

TL;DR
This paper investigates the effectiveness of concept erasure in diffusion models by proposing two mechanisms and introducing diverse probing techniques to evaluate whether the target concept is truly removed, highlighting the need for comprehensive assessment methods.
Contribution
The paper introduces two conceptual models for concept erasure in diffusion models and develops a suite of probing techniques to evaluate erasure robustness comprehensively.
Findings
Proposed two mechanisms: interference with guidance and likelihood reduction.
Developed diverse probing techniques for thorough evaluation.
Emphasized importance of robustness testing beyond adversarial inputs.
Abstract
In concept erasure, a model is modified to selectively prevent it from generating a target concept. Despite the rapid development of new methods, it remains unclear how thoroughly these approaches remove the target concept from the model. We begin by proposing two conceptual models for the erasure mechanism in diffusion models: (i) interfering with the model's internal guidance processes, and (ii) reducing the unconditional likelihood of generating the target concept, potentially removing it entirely. To assess whether a concept has been truly erased from the model, we introduce a comprehensive suite of independent probing techniques: supplying visual context, modifying the diffusion trajectory, applying classifier guidance, and analyzing the model's alternative generations that emerge in place of the erased concept. Our results shed light on the value of exploring concept erasure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Domain Adaptation and Few-Shot Learning · Topic Modeling
MethodsDiffusion
