M-ErasureBench: A Comprehensive Multimodal Evaluation Benchmark for Concept Erasure in Diffusion Models

Ju-Hsuan Weng; Jia-Wei Liao; Cheng-Fu Chou; Jun-Cheng Chen

arXiv:2512.22877·cs.CV·December 30, 2025

M-ErasureBench: A Comprehensive Multimodal Evaluation Benchmark for Concept Erasure in Diffusion Models

Ju-Hsuan Weng, Jia-Wei Liao, Cheng-Fu Chou, Jun-Cheng Chen

PDF

Open Access

TL;DR

This paper introduces M-ErasureBench, a comprehensive benchmark for evaluating concept erasure in diffusion models across multiple input modalities, revealing vulnerabilities and proposing IRECE to enhance robustness.

Contribution

It presents the first multimodal evaluation framework for concept erasure in diffusion models and introduces IRECE, a method to improve erasure robustness across different input types.

Findings

01

Existing methods perform well on text prompts but poorly on learned embeddings and latents.

02

Concept Reproduction Rate exceeds 90% in white-box scenarios without robust defenses.

03

IRECE reduces CRR by up to 40% while maintaining visual quality.

Abstract

Text-to-image diffusion models may generate harmful or copyrighted content, motivating research on concept erasure. However, existing approaches primarily focus on erasing concepts from text prompts, overlooking other input modalities that are increasingly critical in real-world applications such as image editing and personalized generation. These modalities can become attack surfaces, where erased concepts re-emerge despite defenses. To bridge this gap, we introduce M-ErasureBench, a novel multimodal evaluation framework that systematically benchmarks concept erasure methods across three input modalities: text prompts, learned embeddings, and inverted latents. For the latter two, we evaluate both white-box and black-box access, yielding five evaluation scenarios. Our analysis shows that existing methods achieve strong erasure performance against text prompts but largely fail under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning