Erasing Concepts, Steering Generations: A Comprehensive Survey of Concept Suppression
Yiwei Xie, Ping Liu, Zheng Zhang

TL;DR
This survey reviews concept erasure techniques in text-to-image diffusion models, addressing ethical concerns by enabling selective removal of sensitive concepts while maintaining model utility.
Contribution
It provides a comprehensive taxonomy of existing methods, evaluates current benchmarks, and discusses future challenges in concept suppression for generative AI.
Findings
Categorizes approaches by intervention level, optimization structure, and semantic scope.
Highlights trade-offs between erasure specificity, generalization, and computational complexity.
Identifies gaps in evaluation benchmarks and emphasizes the need for robustness and practical effectiveness.
Abstract
Text-to-Image (T2I) models have demonstrated impressive capabilities in generating high-quality and diverse visual content from natural language prompts. However, uncontrolled reproduction of sensitive, copyrighted, or harmful imagery poses serious ethical, legal, and safety challenges. To address these concerns, the concept erasure paradigm has emerged as a promising direction, enabling the selective removal of specific semantic concepts from generative models while preserving their overall utility. This survey provides a comprehensive overview and in-depth synthesis of concept erasure techniques in T2I diffusion models. We systematically categorize existing approaches along three key dimensions: intervention level, which identifies specific model components targeted for concept removal; optimization structure, referring to the algorithmic strategies employed to achieve suppression;…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
MethodsDiffusion
