Prototype-Guided Concept Erasure in Diffusion Models

Yuze Cai; Jiahao Lu; Hongxiang Shi; Yichao Zhou; Hong Lu

arXiv:2603.08271·cs.CV·March 10, 2026

Prototype-Guided Concept Erasure in Diffusion Models

Yuze Cai, Jiahao Lu, Hongxiang Shi, Yichao Zhou, Hong Lu

PDF

Open Access

TL;DR

This paper introduces a prototype-guided method for erasing broad concepts in diffusion models by leveraging embedding clustering to identify and remove concept representations, improving reliability and safety in image generation.

Contribution

It proposes a novel approach using embedding clustering to identify concept prototypes for more effective broad concept erasure in diffusion models.

Findings

01

Enhanced erasure of broad concepts like 'sexual' or 'violent'

02

Preserved image quality after concept removal

03

Outperformed existing methods on multiple benchmarks

Abstract

Concept erasure is extensively utilized in image generation to prevent text-to-image models from generating undesired content. Existing methods can effectively erase narrow concepts that are specific and concrete, such as distinct intellectual properties (e.g. Pikachu) or recognizable characters (e.g. Elon Musk). However, their performance degrades on broad concepts such as ``sexual'' or ``violent'', whose wide scope and multi-faceted nature make them difficult to erase reliably. To overcome this limitation, we exploit the model's intrinsic embedding geometry to identify latent embeddings that encode a given concept. By clustering these embeddings, we derive a set of concept prototypes that summarize the model's internal representations of the concept, and employ them as negative conditioning signals during inference to achieve precise and reliable erasure. Extensive experiments across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning