A Concept is More Than a Word: Diversified Unlearning in Text-to-Image Diffusion Models

Duc Hao Pham; Van Duy Truong; Duy Khanh Dinh; Tien Cuong Nguyen; Dien Hy Ngo; Tuan Anh Bui

arXiv:2603.18767·cs.AI·March 20, 2026

A Concept is More Than a Word: Diversified Unlearning in Text-to-Image Diffusion Models

Duc Hao Pham, Van Duy Truong, Duy Khanh Dinh, Tien Cuong Nguyen, Dien Hy Ngo, Tuan Anh Bui

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Diversified Unlearning, a novel framework for more precise and robust concept removal in text-to-image diffusion models by using diverse prompts instead of keywords, improving safety and reliability.

Contribution

It proposes a distributional approach to concept unlearning that overcomes keyword limitations, enhancing erasure accuracy and robustness in diffusion models.

Findings

01

Outperforms existing methods in concept erasure accuracy

02

Maintains better retention of unrelated concepts

03

Shows increased robustness against adversarial recovery attacks

Abstract

Concept unlearning has emerged as a promising direction for reducing the risks of harmful content generation in text-to-image diffusion models by selectively erasing undesirable concepts from a model's parameters. Existing approaches typically rely on keywords to identify the target concept to be unlearned. However, we show that this keyword-based formulation is inherently limited: a visual concept is multi-dimensional, can be expressed in diverse textual forms, and often overlap with related concepts in the latent space, making keyword-only unlearning, which imprecisely indicate the target concept is brittle and prone to over-forgetting. This occurs because a single keyword represents only a narrow point estimate of the concept, failing to cover its full semantic distribution and entangled variations in the latent space. To address this limitation, we propose Diversified Unlearning, a…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 5

Strengths

1. The paper is well-written, easy for readers to follow. 2. The proposed method is intuitive, targeting the issue.

Weaknesses

1. Require for more verifying experiments. a) First, one immediately approach is to use prompts to construct pairs of data. For example, <p1, p2> where p1 is a prompt with the word nudity and p2 is the same prompt but without the word nudity. Then we use the pairwise data to train the model. In this scenaro, what is the advantage of this proposed method? b) How do we evaluate the precision of the sum operation directly on the embedding vectors. As we know, the latent space is complex and the

Reviewer 02Rating 6Confidence 3

Strengths

1. The paper shifts the perspective from keyword-based unlearning to distributional unlearning, offering a novel conceptualization of representing concepts as distributions. This perspective provides a promising direction for developing more robust unlearning techniques. 2. The proposed approach functions as a plug-in module compatible with existing unlearning methods, effectively enhancing their performance. 3. Extensive experiments on Stable Diffusion 1.4 demonstrate the effectiveness of the p

Weaknesses

1. Although distributional unlearning is an insightful perspective, the paper lacks a systematic exploration of how distributional ranges differ across concept categories (e.g., celebrity, copyrighted character, explicit concept), and how corresponding contextualized prompts should be designed for each category. 2. The improvement of the proposed method over robust unlearning methods such as [1][2] is not sufficiently demonstrated. 3. All experiments are conducted on Stable Diffusion 1.4, leavin

Reviewer 03Rating 6Confidence 4

Strengths

- The proposed method is simple yet effective in improving concept erasure performance. - The paper conducts comprehensive experiments across five unlearning domains and provides a detailed analysis of prompt diversity, exploring contextual complexity from level 0 to level 7 and varying the number of prompts from 5 to 50. - The paper is well-written and clear. The example in lines 239-242 helps me understand the diversified embedding mixup mechanism.

Weaknesses

- The method shows limited effectiveness in concept preservation. In the copyrighted concept erasure task (Table 1 right), preservation improvements are marginal, about 0.02 in LPIPS for ESD, AP, AGE, and ACE, and less than 1 point in CLIP-i, CLIP-t, and GPT scores. Similarly, in the nudity erasure task (Table 2), preservation improvements in FID and CLIP metrics are negligible or even degrade. - While the paper mentions adversarially trained unlearning approaches such as AdvUnlearn, R.A.C.E., R

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Hate Speech and Cyberbullying Detection