Erased or Dormant? Rethinking Concept Erasure Through Reversibility
Ping Liu, Chi Zhang

TL;DR
This paper critically evaluates the effectiveness of current concept erasure techniques in diffusion models, revealing that they often only superficially suppress concepts without fully erasing their generative capacity, which can reemerge after minimal adaptation.
Contribution
The study introduces an instance-level evaluation method to test the true erasure of concepts and demonstrates that existing techniques often fail to achieve irreversible removal.
Findings
Erased concepts can reemerge after minimal fine-tuning
Current methods often only superficially suppress targeted concepts
Deeper, representation-level interventions are needed for true erasure
Abstract
To what extent does concept erasure eliminate generative capacity in diffusion models? While prior evaluations have primarily focused on measuring concept suppression under specific textual prompts, we explore a complementary and fundamental question: do current concept erasure techniques genuinely remove the ability to generate targeted concepts, or do they merely achieve superficial, prompt-specific suppression? We systematically evaluate the robustness and reversibility of two representative concept erasure methods, Unified Concept Editing and Erased Stable Diffusion, by probing their ability to eliminate targeted generative behaviors in text-to-image models. These methods attempt to suppress undesired semantic concepts by modifying internal model parameters, either through targeted attention edits or model-level fine-tuning strategies. To rigorously assess whether these techniques…
Peer Reviews
Decision·Submitted to ICLR 2026
- The proposed probe methods to erasing concept revertability make sense. - The experiments are quite intensive with 6 popular erasing methods for diffusion models.
- This paper seems to reinvent the wheel because it is well-known that erasing concepts can be easily restored using personalized AI methods such as Textual Inversion or DreamBooth. Moreover, some recent papers indicated that quantization can restore erasing concepts or even fine-tuning on different classes/concepts can restore erasing class/concept. - Moreover, theoretical analysis of reactivation bound is not clearly presented. Although the results in Theorems 2 and 3 make sense and quite tri
* The paper is generally well written and the problem of whether current erasure methods are doing suppression is well-motivated. * The theoretical analysis showing that the reactivated model remain close to the unerased model is appreciated.
* Since all of the probing techniques all update the weights, I do not see how this is different from simply fine-tuning the model to learn the supposedly erased concepts. * It is unclear to me whether the reactivation of the erased concepts is because the concepts are not fully erased, or simply because the probing techniques are very good at making the model learn concepts.
- The paper presents an interesting theoretical result: the expected squared difference between the unlearned and recovered models converges to a steady-state bound, implying that the difference between their parameters remains bounded
- Writing issue: The statement in line 151 — “as a result, erased concepts are often conditionally suppressed rather than fully removed” — is not well supported by experimental or theoretical evidence. - The Instance-Personalization Probe is conceptually similar to the approach in [1] (which uses DreamBooth rather than Textual Inversion, as in [1]). It inherits a critical limitation: the strong dependence of personalization quality on the choice of the reference set $\mathcal{X}_{ref}$. - Using
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Generative Adversarial Networks and Image Synthesis
MethodsSoftmax · Attention Is All You Need · Diffusion
