TL;DR
This paper introduces a minimalist concept erasure method for generative models that effectively removes unwanted concepts by focusing solely on output distributional differences, ensuring safety without sacrificing model utility.
Contribution
It proposes a novel distribution-based erasure objective, a tractable optimization loss, and neuron masking techniques, advancing concept erasure with minimal model modifications.
Findings
Successfully erases concepts without performance loss
The method is robust across state-of-the-art models
The approach enhances safety and responsibility in generative modeling
Abstract
Recent advances in generative models have demonstrated remarkable capabilities in producing high-quality images, but their reliance on large-scale unlabeled data has raised significant safety and copyright concerns. Efforts to address these issues by erasing unwanted concepts have shown promise. However, many existing erasure methods involve excessive modifications that compromise the overall utility of the model. In this work, we address these issues by formulating a novel minimalist concept erasure objective based \emph{only} on the distributional distance of final generation outputs. Building on our formulation, we derive a tractable loss for differentiable optimization that leverages backpropagation through all generation steps in an end-to-end manner. We also conduct extensive analysis to show theoretical connections with other models and methods. To improve the robustness of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
