Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them

Anh Bui; Trang Vu; Long Vuong; Trung Le; Paul Montague; Tamas Abraham; Junae Kim; Dinh Phung

arXiv:2501.18950·cs.LG·May 26, 2025

Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them

Anh Bui, Trang Vu, Long Vuong, Trung Le, Paul Montague, Tamas Abraham, Junae Kim, Dinh Phung

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces a novel concept erasure method for diffusion models that dynamically selects optimal targets, improving the balance between effective removal of undesirable concepts and preservation of unrelated ones.

Contribution

The paper proposes the Adaptive Guided Erasure (AGE) method, modeling concept interactions as a graph and dynamically choosing targets to enhance erasure effectiveness and reduce side effects.

Findings

01

AGE outperforms existing methods in concept erasure tasks.

02

Concept influence is localized in the concept space.

03

Dynamic target selection reduces unintended side effects.

Abstract

Concept erasure has emerged as a promising technique for mitigating the risk of harmful content generation in diffusion models by selectively unlearning undesirable concepts. The common principle of previous works to remove a specific concept is to map it to a fixed generic concept, such as a neutral concept or just an empty text prompt. In this paper, we demonstrate that this fixed-target strategy is suboptimal, as it fails to account for the impact of erasing one concept on the others. To address this limitation, we model the concept space as a graph and empirically analyze the effects of erasing one concept on the remaining concepts. Our analysis uncovers intriguing geometric properties of the concept space, where the influence of erasing a concept is confined to a local region. Building on this insight, we propose the Adaptive Guided Erasure (AGE) method, which \emph{dynamically}…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 8Confidence 4

Strengths

Originality# The paper presents a novel approach to concept erasure in diffusion models by introducing the Adaptive Guided Erasure (AGE) method. This method departs from the traditional fixed-target strategy by dynamically selecting optimal target concepts, which is a significant innovation in the field. The modeling of the concept space as a graph to understand the localized impact of concept erasure is a creative and original contribution. This approach not only addresses the limitations of ex

Weaknesses

arget Concept Selection: The paper could benefit from a more detailed exploration of the target concept selection process, including specific examples and potential challenges. Scalability: The scalability of the minimax optimization approach for large concept spaces is not fully addressed. Discussing computational complexity and optimization strategies would be helpful. Generalization: The method's applicability to different types of diffusion or generative models is not thoroughly explored. Ad

Reviewer 02Rating 6Confidence 3

Strengths

Concept erasure for reducing harmful content creation is clearly a highly important and impactful research area. The proposed approach has several strengths: * Clever and intuitive knowledge graph-based approach * Clear and well motivated storyline for the proposed objective * Wide variety of empirical experiments

Weaknesses

I think that the paper could be improved by the following: * The general techniques and ideas appear as not very complex or novel (rather an application of related ideas, e.g. classic graph-based approaches) to new problems. On some level, the depth of empirical analysis makes up for this. However, the reader is left feeling as though there could have been more methodological innovation in the work. * The presentation of results could be a bit clearer to show more of where gains come from. For

Reviewer 03Rating 5Confidence 2

Strengths

1. AGE introduces an adaptive erasure approach that refines concept targeting by leveraging graph-based insights about concept space structure. 2. The method minimizes unintended impacts on unrelated concepts, which addresses a notable limitation in previous fixed-target erasure methods. 3. The findings are validated across different models, reinforcing AGE’s potential adaptability to various generative tasks and model architectures.

Weaknesses

1. The optimization procedure for selecting target concepts may be computationally demanding for models with large concept spaces, which could limit AGE’s scalability in practice. 2. The approach’s effectiveness relies on the accuracy of the concept graph’s structure. Any inaccuracies in capturing semantic relationships may affect erasure outcomes. Is there any discussion regarding this? 3. Are there any human evaluations of artistic style? 4. The proposed method doesn't achieve the SOTA, like T

Code & Models

Repositories

tuananhbui89/adaptive-guided-erasure
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical and Computational Modeling

MethodsDiffusion