Boosting Alignment for Post-Unlearning Text-to-Image Generative Models
Myeongseob Ko, Henry Li, Zhun Wang, Jonathan Patsenker, Jiachen T., Wang, Qinbin Li, Ming Jin, Dawn Song, Ruoxi Jia

TL;DR
This paper introduces a novel unlearning framework for text-to-image generative models that effectively removes undesirable content while preserving model alignment, outperforming existing methods.
Contribution
The authors propose an optimal model update strategy for unlearning that ensures continuous improvement in both unlearning quality and text-image alignment.
Findings
Successfully removes target concepts from diffusion models
Maintains close alignment with original trained models
Outperforms state-of-the-art unlearning baselines
Abstract
Large-scale generative models have shown impressive image-generation capabilities, propelled by massive data. However, this often inadvertently leads to the generation of harmful or inappropriate content and raises copyright concerns. Driven by these concerns, machine unlearning has become crucial to effectively purge undesirable knowledge from models. While existing literature has studied various unlearning techniques, these often suffer from either poor unlearning quality or degradation in text-image alignment after unlearning, due to the competitive nature of these objectives. To address these challenges, we propose a framework that seeks an optimal model update at each unlearning iteration, ensuring monotonic improvement on both objectives. We further derive the characterization of such an update. In addition, we design procedures to strategically diversify the unlearning and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques
MethodsDiffusion
