Prototype-Guided Concept Erasure in Diffusion Models
Yuze Cai, Jiahao Lu, Hongxiang Shi, Yichao Zhou, Hong Lu

TL;DR
This paper introduces a prototype-guided method for erasing broad concepts in diffusion models by leveraging embedding clustering to identify and remove concept representations, improving reliability and safety in image generation.
Contribution
It proposes a novel approach using embedding clustering to identify concept prototypes for more effective broad concept erasure in diffusion models.
Findings
Enhanced erasure of broad concepts like 'sexual' or 'violent'
Preserved image quality after concept removal
Outperformed existing methods on multiple benchmarks
Abstract
Concept erasure is extensively utilized in image generation to prevent text-to-image models from generating undesired content. Existing methods can effectively erase narrow concepts that are specific and concrete, such as distinct intellectual properties (e.g. Pikachu) or recognizable characters (e.g. Elon Musk). However, their performance degrades on broad concepts such as ``sexual'' or ``violent'', whose wide scope and multi-faceted nature make them difficult to erase reliably. To overcome this limitation, we exploit the model's intrinsic embedding geometry to identify latent embeddings that encode a given concept. By clustering these embeddings, we derive a set of concept prototypes that summarize the model's internal representations of the concept, and employ them as negative conditioning signals during inference to achieve precise and reliable erasure. Extensive experiments across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
