Unlearning or Concealment? A Critical Analysis and Evaluation Metrics for Unlearning in Diffusion Models
Aakash Sen Sharma, Niladri Sarkar, Vikram Chundawat, Ankur A Mali,, Murari Mandal

TL;DR
This paper critically examines the effectiveness of current unlearning methods in diffusion models, revealing they often conceal rather than erase targeted concepts, and introduces new metrics for better evaluation.
Contribution
It provides a comprehensive analysis of existing unlearning techniques, identifies their weaknesses, and proposes two novel metrics, CRS and CCS, for more robust assessment.
Findings
Existing methods often conceal concepts instead of fully removing them.
CRS and CCS metrics effectively measure concept retention and confidence.
Current unlearning techniques show significant shortcomings in true concept erasure.
Abstract
Recent research has seen significant interest in methods for concept removal and targeted forgetting in text-to-image diffusion models. In this paper, we conduct a comprehensive white-box analysis showing the vulnerabilities in existing diffusion model unlearning methods. We show that existing unlearning methods lead to decoupling of the targeted concepts (meant to be forgotten) for the corresponding prompts. This is concealment and not actual forgetting, which was the original goal. This paper presents a rigorous theoretical and empirical examination of five commonly used techniques for unlearning in diffusion models, while showing their potential weaknesses. We introduce two new evaluation metrics: Concept Retrieval Score (\textbf{CRS}) and Concept Confidence Score (\textbf{CCS}). These metrics are based on a successful adversarial attack setup that can recover \textit{forgotten}…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
MethodsDiffusion · Focus
