M-ErasureBench: A Comprehensive Multimodal Evaluation Benchmark for Concept Erasure in Diffusion Models
Ju-Hsuan Weng, Jia-Wei Liao, Cheng-Fu Chou, Jun-Cheng Chen

TL;DR
This paper introduces M-ErasureBench, a comprehensive benchmark for evaluating concept erasure in diffusion models across multiple input modalities, revealing vulnerabilities and proposing IRECE to enhance robustness.
Contribution
It presents the first multimodal evaluation framework for concept erasure in diffusion models and introduces IRECE, a method to improve erasure robustness across different input types.
Findings
Existing methods perform well on text prompts but poorly on learned embeddings and latents.
Concept Reproduction Rate exceeds 90% in white-box scenarios without robust defenses.
IRECE reduces CRR by up to 40% while maintaining visual quality.
Abstract
Text-to-image diffusion models may generate harmful or copyrighted content, motivating research on concept erasure. However, existing approaches primarily focus on erasing concepts from text prompts, overlooking other input modalities that are increasingly critical in real-world applications such as image editing and personalized generation. These modalities can become attack surfaces, where erased concepts re-emerge despite defenses. To bridge this gap, we introduce M-ErasureBench, a novel multimodal evaluation framework that systematically benchmarks concept erasure methods across three input modalities: text prompts, learned embeddings, and inverted latents. For the latter two, we evaluate both white-box and black-box access, yielding five evaluation scenarios. Our analysis shows that existing methods achieve strong erasure performance against text prompts but largely fail under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
