Unlearning or Concealment? A Critical Analysis and Evaluation Metrics   for Unlearning in Diffusion Models

Aakash Sen Sharma; Niladri Sarkar; Vikram Chundawat; Ankur A Mali,; Murari Mandal

arXiv:2409.05668·cs.LG·December 13, 2024

Unlearning or Concealment? A Critical Analysis and Evaluation Metrics for Unlearning in Diffusion Models

Aakash Sen Sharma, Niladri Sarkar, Vikram Chundawat, Ankur A Mali,, Murari Mandal

PDF

Open Access

TL;DR

This paper critically examines the effectiveness of current unlearning methods in diffusion models, revealing they often conceal rather than erase targeted concepts, and introduces new metrics for better evaluation.

Contribution

It provides a comprehensive analysis of existing unlearning techniques, identifies their weaknesses, and proposes two novel metrics, CRS and CCS, for more robust assessment.

Findings

01

Existing methods often conceal concepts instead of fully removing them.

02

CRS and CCS metrics effectively measure concept retention and confidence.

03

Current unlearning techniques show significant shortcomings in true concept erasure.

Abstract

Recent research has seen significant interest in methods for concept removal and targeted forgetting in text-to-image diffusion models. In this paper, we conduct a comprehensive white-box analysis showing the vulnerabilities in existing diffusion model unlearning methods. We show that existing unlearning methods lead to decoupling of the targeted concepts (meant to be forgotten) for the corresponding prompts. This is concealment and not actual forgetting, which was the original goal. This paper presents a rigorous theoretical and empirical examination of five commonly used techniques for unlearning in diffusion models, while showing their potential weaknesses. We introduce two new evaluation metrics: Concept Retrieval Score (\textbf{CRS}) and Concept Confidence Score (\textbf{CCS}). These metrics are based on a successful adversarial attack setup that can recover \textit{forgotten}…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning

MethodsDiffusion · Focus