A Comprehensive Survey on Concept Erasure in Text-to-Image Diffusion Models
Changhoon Kim, Yanjun Qi

TL;DR
This survey reviews methods for concept erasure in text-to-image diffusion models, addressing ethical concerns by preventing undesired content generation through various model modification techniques and evaluating their effectiveness.
Contribution
It categorizes existing concept erasure methods, discusses challenges like adversarial attacks, and consolidates evaluation resources, providing a comprehensive overview of the field.
Findings
Categorized concept erasure techniques into fine-tuning, closed-form solutions, and inference-time interventions.
Highlighted challenges such as adversarial attacks and defenses in concept erasure.
Compiled datasets and metrics for evaluating erasure effectiveness and robustness.
Abstract
Text-to-Image (T2I) models have made remarkable progress in generating high-quality, diverse visual content from natural language prompts. However, their ability to reproduce copyrighted styles, sensitive imagery, and harmful content raises significant ethical and legal concerns. Concept erasure offers a proactive alternative to external filtering by modifying T2I models to prevent the generation of undesired content. In this survey, we provide a structured overview of concept erasure, categorizing existing methods based on their optimization strategies and the architectural components they modify. We categorize concept erasure methods into fine-tuning for parameter updates, closed-form solutions for efficient edits, and inference-time interventions for content restriction without weight modification. Additionally, we explore adversarial attacks that bypass erasure techniques and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
