R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image   Diffusion Model

Changhoon Kim; Kyle Min; Yezhou Yang

arXiv:2405.16341·cs.CV·July 24, 2024

R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model

Changhoon Kim, Kyle Min, Yezhou Yang

PDF

Open Access

TL;DR

RACE introduces an adversarial training framework to improve the robustness of concept erasure in text-to-image diffusion models, significantly reducing the risk of generating sensitive or inappropriate content.

Contribution

It presents a novel adversarial training approach that enhances concept erasure robustness in T2I models, addressing adversarial vulnerabilities.

Findings

01

30 percentage point reduction in attack success rate for nudity concept

02

Effective defense against white-box and black-box attacks

03

Improved safety in text-to-image generation

Abstract

In the evolving landscape of text-to-image (T2I) diffusion models, the remarkable capability to generate high-quality images from textual descriptions faces challenges with the potential misuse of reproducing sensitive content. To address this critical issue, we introduce \textbf{R}obust \textbf{A}dversarial \textbf{C}oncept \textbf{E}rase (RACE), a novel approach designed to mitigate these risks by enhancing the robustness of concept erasure method for T2I models. RACE utilizes a sophisticated adversarial training framework to identify and mitigate adversarial text embeddings, significantly reducing the Attack Success Rate (ASR). Impressively, RACE achieves a 30 percentage point reduction in ASR for the ``nudity'' concept against the leading white-box attack method. Our extensive evaluations demonstrate RACE's effectiveness in defending against both white-box and black-box attacks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection

MethodsDiffusion