Circumventing Concept Erasure Methods For Text-to-Image Generative   Models

Minh Pham; Kelly O. Marshall; Niv Cohen; Govind Mittal; Chinmay Hegde

arXiv:2308.01508·cs.LG·October 10, 2023·2 cites

Circumventing Concept Erasure Methods For Text-to-Image Generative Models

Minh Pham, Kelly O. Marshall, Niv Cohen, Govind Mittal, Chinmay Hegde

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper critically evaluates five recent concept erasure methods for text-to-image models, revealing their inability to fully remove targeted concepts and exposing their vulnerabilities through learned embeddings that can retrieve erased content.

Contribution

The study demonstrates that current concept erasure techniques are insufficient and introduces the use of learned embeddings to recover erased concepts, challenging their effectiveness for AI safety.

Findings

01

Targeted concepts are not fully erased by existing methods.

02

Learned embeddings can retrieve erased concepts without model modifications.

03

Post hoc erasure methods are brittle and unreliable.

Abstract

Text-to-image generative models can produce photo-realistic images for an extremely broad range of concepts, and their usage has proliferated widely among the general public. On the flip side, these models have numerous drawbacks, including their potential to generate images featuring sexually explicit content, mirror artistic styles without permission, or even hallucinate (or deepfake) the likenesses of celebrities. Consequently, various methods have been proposed in order to "erase" sensitive concepts from text-to-image models. In this work, we examine five recently proposed concept erasure methods, and show that targeted concepts are not fully excised from any of these methods. Specifically, we leverage the existence of special learned word embeddings that can retrieve "erased" concepts from the sanitized models with no alterations to their weights. Our results highlight the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nyu-dice-lab/circumventing-concept-erasure
pytorchOfficial

Videos

Circumventing Concept Erasure Methods For Text-To-Image Generative Models· slideslive

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis

MethodsFLIP · High-Order Consensuses