Disentangled Sparse Representations for Concept-Separated Diffusion Unlearning
Hyeonjin Kim, Hangyeol Jung, Heechan Yun, Sungjun Yun, and Dong-Jun Han

TL;DR
This paper introduces SAEParate, a novel autoencoder-based method that organizes latent representations into concept-specific clusters for more effective concept unlearning in diffusion models.
Contribution
SAEParate employs a concept-aware contrastive objective and enhanced encoder to achieve explicit concept separation and improved unlearning performance.
Findings
State-of-the-art unlearning performance on UnlearnCanvas
Strong gains in joint style-object unlearning
Reduced interference between target and non-target concepts
Abstract
Unlearning specific concepts in text-to-image diffusion models has become increasingly important for preventing undesirable content generation. Among prior approaches, sparse autoencoder (SAE)-based methods have attracted attention due to their ability to suppress target concepts through lightweight manipulation of latent features, without modifying model parameters. However, SAEs trained with sparse reconstruction objectives do not explicitly enforce concept-wise separation, resulting in shared latent features across concepts. To address this, we propose SAEParate, which organizes latent representations into concept-specific clusters via a concept-aware contrastive objective, enabling more precise concept suppression while reducing unintended interference during unlearning. In addition, we enhance the encoder with a GeLU-based nonlinear transformation to increase its expressive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
