Unlearning Concepts in Diffusion Model via Concept Domain Correction and   Concept Preserving Gradient

Yongliang Wu; Shiji Zhou; Mingzhuo Yang; Lianzhe Wang; Heng Chang,; Wenbo Zhu; Xinting Hu; Xiao Zhou; Xu Yang

arXiv:2405.15304·cs.LG·March 19, 2025·3 cites

Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient

Yongliang Wu, Shiji Zhou, Mingzhuo Yang, Lianzhe Wang, Heng Chang,, Wenbo Zhu, Xinting Hu, Xiao Zhou, Xu Yang

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces DoCo, a novel framework for unlearning sensitive concepts in diffusion models by aligning concept domains and preserving utility through gradient surgery, effectively removing targeted concepts with minimal utility loss.

Contribution

The paper proposes DoCo, a new concept domain correction method combined with gradient surgery to improve unlearning of sensitive concepts in diffusion models, especially for out-of-distribution prompts.

Findings

01

Effective unlearning of sensitive concepts across various styles and prompts.

02

Minimal impact on model utility after unlearning.

03

Outperforms previous methods in out-of-distribution scenarios.

Abstract

Text-to-image diffusion models have achieved remarkable success in generating photorealistic images. However, the inclusion of sensitive information during pre-training poses significant risks. Machine Unlearning (MU) offers a promising solution to eliminate sensitive concepts from these models. Despite its potential, existing MU methods face two main challenges: 1) limited generalization, where concept erasure is effective only within the unlearned set, failing to prevent sensitive concept generation from out-of-set prompts; and 2) utility degradation, where removing target concepts significantly impacts the model's overall performance. To address these issues, we propose a novel concept domain correction framework named \textbf{DoCo} (\textbf{Do}main \textbf{Co}rrection). By aligning the output domains of sensitive and anchor concepts through adversarial training, our approach ensures…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient· underline

Taxonomy

TopicsText and Document Classification Technologies · Machine Learning and Data Classification · Neural Networks and Applications

MethodsSparse Evolutionary Training · Diffusion