Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning

Saemi Moon; Minjong Lee; Sangdon Park; Dongwoo Kim

arXiv:2410.05664·cs.CV·November 11, 2025

Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning

Saemi Moon, Minjong Lee, Sangdon Park, Dongwoo Kim

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a comprehensive benchmark for evaluating unlearning methods in text-to-image diffusion models across multiple dimensions, addressing broader impacts beyond concept removal quality.

Contribution

It proposes the Holistic Unlearning Benchmark (HUB), a multi-faceted evaluation framework covering six key criteria for assessing unlearning methods.

Findings

01

No single method excels across all criteria

02

The benchmark covers 33 concepts with 16,000 prompts each

03

Releasing code and dataset to foster further research

Abstract

As text-to-image diffusion models gain widespread commercial applications, there are increasing concerns about unethical or harmful use, including the unauthorized generation of copyrighted or sensitive content. Concept unlearning has emerged as a promising solution to these challenges by removing undesired and harmful information from the pre-trained model. However, the previous evaluations primarily focus on whether target concepts are removed while preserving image quality, neglecting the broader impacts such as unintended side effects. In this work, we propose Holistic Unlearning Benchmark (HUB), a comprehensive framework for evaluating unlearning methods across six key dimensions: faithfulness, alignment, pinpoint-ness, multilingual robustness, attack robustness, and efficiency. Our benchmark covers 33 target concepts, including 16,000 prompts per concept, spanning four categories:…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 5Confidence 3

Strengths

The paper introduces a novel and comprehensive framework specifically designed for benchmarking unlearning methods in text-to-image diffusion models, which is crucial for ensuring ethical and responsible AI usage. Specifically, the paper evaluates six methods from five different aspects. The experimental settings are clearly presented and the takeaway notes are interesting. Overall, it raises an awareness of evaluating unlearning methods for the generalization ability.

Weaknesses

While the paper proposes an interesting evaluation framework for unlearning methods, it does not present an in-depth discussion in each experiment that technically analyzes how certain method affects the performance. Therefore, it lacks depth in providing a comprehensive technical analysis for current unlearning methods. Unmatched Content to Purpose: For instance，in Sec.5.2, the mentioned purpose is to find out how does unlearning change the underlying estimated distribution, while the experime

Reviewer 02Rating 5Confidence 3

Strengths

- The paper is well-structured and clearly written, making it accessible and easy to follow. - By proposing a new benchmark for unlearning, the paper addresses an emerging direction in generative model research. Given the increasing power and prevalence of generative models, it is timely and important to examine ways to prevent harmful content generation.

Weaknesses

### Limited Scope and Specificity of the Proposed Benchmark - **Narrow Methodology**: The benchmark focuses on only six existing unlearning methods, limiting its generalizability. It is tailored specifically to these methods rather than serving as a more widely applicable benchmark. - **Restricted Target Concepts**: The benchmark is tested on just four target concepts, which may not sufficiently represent real-world applications. Additionally, unlearning is often applied to remove harmful or in

Reviewer 03Rating 5Confidence 3

Strengths

- The observation that existing unlearning methods perform differently depending on the complexity of prompts used to test concept removal is insightful. - The paper suggests several ways to measure model performance more precisely, which contribute valuable analyses. - The measurement of influence on related concepts in Section 5.1 appears to be a useful approach for thoroughly testing the over-erasing issue in unlearning. - Section 5.2’s sampling from an unconditional model provides a way

Weaknesses

- Although this is a paper focused on benchmarking and analysis, the number of concepts used was limited, and there were no experiments related to violence, nudity, or copyright issues - topics of particular interest in unlearning. Wouldn’t it be more beneficial to increase the number of concepts rather than reduce the number of prompts for each? - The value of some additional performance analyses is unclear for me. - In Figure 2, using simple and diverse prompts does not seem to add signi

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematics, Computing, and Information Processing

MethodsDiffusion