Direct Unlearning Optimization for Robust and Safe Text-to-Image Models

Yong-Hyun Park; Sangdoo Yun; Jin-Hwa Kim; Junho Kim; Geonhui Jang,; Yonghyun Jeong; Junghyo Jo; Gayoung Lee

arXiv:2407.21035·cs.CV·January 17, 2025·1 cites

Direct Unlearning Optimization for Robust and Safe Text-to-Image Models

Yong-Hyun Park, Sangdoo Yun, Jin-Hwa Kim, Junho Kim, Geonhui Jang,, Yonghyun Jeong, Junghyo Jo, Gayoung Lee

PDF

Open Access 1 Video

TL;DR

This paper introduces DUO, a new method for effectively removing unsafe content from text-to-image models while maintaining their ability to generate safe images, enhancing model safety and reliability.

Contribution

The paper proposes Direct Unlearning Optimization (DUO), a novel framework that robustly removes unsafe content from T2I models without degrading performance on unrelated topics.

Findings

01

DUO effectively defends against adversarial attacks.

02

Maintains high image quality on safe content.

03

Outperforms existing unlearning methods in safety and performance.

Abstract

Recent advancements in text-to-image (T2I) models have unlocked a wide range of applications but also present significant risks, particularly in their potential to generate unsafe content. To mitigate this issue, researchers have developed unlearning techniques to remove the model's ability to generate potentially harmful content. However, these methods are easily bypassed by adversarial attacks, making them unreliable for ensuring the safety of generated images. In this paper, we propose Direct Unlearning Optimization (DUO), a novel framework for removing Not Safe For Work (NSFW) content from T2I models while preserving their performance on unrelated topics. DUO employs a preference optimization approach using curated paired image data, ensuring that the model learns to remove unsafe visual concepts while retaining unrelated features. Furthermore, we introduce an output-preserving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Direct Unlearning Optimization for Robust and Safe Text-to-Image Models· slideslive

Taxonomy

TopicsBrain Tumor Detection and Classification · Medical Imaging Techniques and Applications · Cell Image Analysis Techniques

MethodsContrastive Language-Image Pre-training