Direct Unlearning Optimization for Robust and Safe Text-to-Image Models
Yong-Hyun Park, Sangdoo Yun, Jin-Hwa Kim, Junho Kim, Geonhui Jang,, Yonghyun Jeong, Junghyo Jo, Gayoung Lee

TL;DR
This paper introduces DUO, a new method for effectively removing unsafe content from text-to-image models while maintaining their ability to generate safe images, enhancing model safety and reliability.
Contribution
The paper proposes Direct Unlearning Optimization (DUO), a novel framework that robustly removes unsafe content from T2I models without degrading performance on unrelated topics.
Findings
DUO effectively defends against adversarial attacks.
Maintains high image quality on safe content.
Outperforms existing unlearning methods in safety and performance.
Abstract
Recent advancements in text-to-image (T2I) models have unlocked a wide range of applications but also present significant risks, particularly in their potential to generate unsafe content. To mitigate this issue, researchers have developed unlearning techniques to remove the model's ability to generate potentially harmful content. However, these methods are easily bypassed by adversarial attacks, making them unreliable for ensuring the safety of generated images. In this paper, we propose Direct Unlearning Optimization (DUO), a novel framework for removing Not Safe For Work (NSFW) content from T2I models while preserving their performance on unrelated topics. DUO employs a preference optimization approach using curated paired image data, ensuring that the model learns to remove unsafe visual concepts while retaining unrelated features. Furthermore, we introduce an output-preserving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsBrain Tumor Detection and Classification · Medical Imaging Techniques and Applications · Cell Image Analysis Techniques
MethodsContrastive Language-Image Pre-training
