Boosting Alignment for Post-Unlearning Text-to-Image Generative Models

Myeongseob Ko; Henry Li; Zhun Wang; Jonathan Patsenker; Jiachen T.; Wang; Qinbin Li; Ming Jin; Dawn Song; Ruoxi Jia

arXiv:2412.07808·cs.LG·March 11, 2025

Boosting Alignment for Post-Unlearning Text-to-Image Generative Models

Myeongseob Ko, Henry Li, Zhun Wang, Jonathan Patsenker, Jiachen T., Wang, Qinbin Li, Ming Jin, Dawn Song, Ruoxi Jia

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel unlearning framework for text-to-image generative models that effectively removes undesirable content while preserving model alignment, outperforming existing methods.

Contribution

The authors propose an optimal model update strategy for unlearning that ensures continuous improvement in both unlearning quality and text-image alignment.

Findings

01

Successfully removes target concepts from diffusion models

02

Maintains close alignment with original trained models

03

Outperforms state-of-the-art unlearning baselines

Abstract

Large-scale generative models have shown impressive image-generation capabilities, propelled by massive data. However, this often inadvertently leads to the generation of harmful or inappropriate content and raises copyright concerns. Driven by these concerns, machine unlearning has become crucial to effectively purge undesirable knowledge from models. While existing literature has studied various unlearning techniques, these often suffer from either poor unlearning quality or degradation in text-image alignment after unlearning, due to the competitive nature of these objectives. To address these challenges, we propose a framework that seeks an optimal model update at each unlearning iteration, ensuring monotonic improvement on both objectives. We further derive the characterization of such an update. In addition, we design procedures to strategically diversify the unlearning and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

reds-lab/restricted_gradient_diversity_unlearning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques

MethodsDiffusion