Memories of Forgotten Concepts
Matan Rusanovsky, Shimon Malnick, Amir Jevnisek, Ohad Fried, Shai, Avidan

TL;DR
This paper demonstrates that erased concept information in diffusion models persists and can be reconstructed using inversion methods, revealing vulnerabilities in current concept ablation techniques.
Contribution
It uncovers that erased concepts can be generated from latent seeds, challenging the effectiveness of existing concept ablation methods.
Findings
Erased concept images can be generated from specific latent seeds.
Latent likelihoods of erased concepts overlap with other images.
Complete erasure of concept information may be fundamentally intractable.
Abstract
Diffusion models dominate the space of text-to-image generation, yet they may produce undesirable outputs, including explicit content or private data. To mitigate this, concept ablation techniques have been explored to limit the generation of certain concepts. In this paper, we reveal that the erased concept information persists in the model and that erased concept images can be generated using the right latent. Utilizing inversion methods, we show that there exist latent seeds capable of generating high quality images of erased concepts. Moreover, we show that these latents have likelihoods that overlap with those of images outside the erased concept. We extend this to demonstrate that for every image from the erased concept set, we can generate many seeds that generate the erased concept. Given the vast space of latents capable of generating ablated concept images, our results suggest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Misinformation and Its Impacts · Topic Modeling
