Rethinking Robust Adversarial Concept Erasure in Diffusion Models
Qinghong Yin, Yu Tian, Heming Yang, Xiang Chen, Xianlin Zhang, Xueming Li, Yue Zhan

TL;DR
This paper introduces S-GRACE, a semantics-guided adversarial concept erasure method for diffusion models that significantly improves unlearning of sensitive content while preserving other concepts and reducing training time.
Contribution
It proposes a novel semantics-guided approach for adversarial concept erasure in diffusion models, addressing limitations of existing methods.
Findings
S-GRACE improves erasure performance by 26%.
It better preserves non-target concepts.
Reduces training time by 90%.
Abstract
Concept erasure aims to selectively unlearning undesirable content in diffusion models (DMs) to reduce the risk of sensitive content generation. As a novel paradigm in concept erasure, most existing methods employ adversarial training to identify and suppress target concepts, thus reducing the likelihood of sensitive outputs. However, these methods often neglect the specificity of adversarial training in DMs, resulting in only partial mitigation. In this work, we investigate and quantify this specificity from the perspective of concept space, i.e., can adversarial samples truly fit the target concept space? We observe that existing methods neglect the role of conceptual semantics when generating adversarial samples, resulting in ineffective fitting of concept spaces. This oversight leads to the following issues: 1) when there are few adversarial samples, they fail to comprehensively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning
