Minimalist Concept Erasure in Generative Models

Yang Zhang; Er Jin; Yanfei Dong; Yixuan Wu; Philip Torr; Ashkan Khakzar; Johannes Stegmaier; Kenji Kawaguchi

arXiv:2507.13386·cs.CV·July 21, 2025

Minimalist Concept Erasure in Generative Models

Yang Zhang, Er Jin, Yanfei Dong, Yixuan Wu, Philip Torr, Ashkan Khakzar, Johannes Stegmaier, Kenji Kawaguchi

PDF

1 Video

TL;DR

This paper introduces a minimalist concept erasure method for generative models that effectively removes unwanted concepts by focusing solely on output distributional differences, ensuring safety without sacrificing model utility.

Contribution

It proposes a novel distribution-based erasure objective, a tractable optimization loss, and neuron masking techniques, advancing concept erasure with minimal model modifications.

Findings

01

Successfully erases concepts without performance loss

02

The method is robust across state-of-the-art models

03

The approach enhances safety and responsibility in generative modeling

Abstract

Recent advances in generative models have demonstrated remarkable capabilities in producing high-quality images, but their reliance on large-scale unlabeled data has raised significant safety and copyright concerns. Efforts to address these issues by erasing unwanted concepts have shown promise. However, many existing erasure methods involve excessive modifications that compromise the overall utility of the model. In this work, we address these issues by formulating a novel minimalist concept erasure objective based \emph{only} on the distributional distance of final generation outputs. Building on our formulation, we derive a tractable loss for differentiable optimization that leverages backpropagation through all generation steps in an end-to-end manner. We also conduct extensive analysis to show theoretical connections with other models and methods. To improve the robustness of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Minimalist Concept Erasure in Generative Models· slideslive