Dark Miner: Defend against undesirable generation for text-to-image diffusion models
Zheling Meng, Bo Peng, Xiaochuan Jin, Yue Jiang, Wei Wang, Jing Dong, Tieniu Tan

TL;DR
Dark Miner is a novel method that enhances the erasure and defense against undesired concept generation in text-to-image diffusion models, especially under adversarial attacks, by mining and circumventing high-probability embeddings.
Contribution
It introduces a three-stage process for more effective erasure of undesired concepts, outperforming existing methods in robustness and preservation of native capabilities.
Findings
Better erasure of undesired concepts compared to previous methods
More robust against multiple adversarial attacks
Preserves native generation capabilities of models
Abstract
Text-to-image diffusion models have been demonstrated with undesired generation due to unfiltered large-scale training data, such as sexual images and copyrights, necessitating the erasure of undesired concepts. Most existing methods focus on modifying the generation probabilities conditioned on the texts containing target concepts. However, they fail to guarantee the desired generation of texts unseen in the training phase, especially for the adversarial texts from malicious attacks. In this paper, we analyze the erasure task and point out that existing methods cannot guarantee the minimization of the total probabilities of undesired generation. To tackle this problem, we propose Dark Miner. It entails a recurring three-stage process that comprises mining, verifying, and circumventing. This method greedily mines embeddings with maximum generation probabilities of target concepts and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing
MethodsDiffusion · Focus
