CURE: Concept Unlearning via Orthogonal Representation Editing in Diffusion Models
Shristi Das Biswas, Arani Roy, Kaushik Roy

TL;DR
CURE is a training-free, efficient framework that unlearns undesired concepts in pre-trained diffusion models by orthogonally editing their weights, improving safety and specificity without retraining.
Contribution
The paper introduces CURE, a novel orthogonal weight-space concept unlearning method that operates in closed-form, enabling rapid and precise removal of unwanted concepts in diffusion models.
Findings
Achieves faster concept removal in 2 seconds.
Effectively unlearns targeted concepts with minimal impact on unrelated capabilities.
Demonstrates robustness against red-teaming and improved safety.
Abstract
As Text-to-Image models continue to evolve, so does the risk of generating unsafe, copyrighted, or privacy-violating content. Existing safety interventions - ranging from training data curation and model fine-tuning to inference-time filtering and guidance - often suffer from incomplete concept removal, susceptibility to jail-breaking, computational inefficiency, or collateral damage to unrelated capabilities. In this paper, we introduce CURE, a training-free concept unlearning framework that operates directly in the weight space of pre-trained diffusion models, enabling fast, interpretable, and highly specific suppression of undesired concepts. At the core of our method is the Spectral Eraser, a closed-form, orthogonal projection module that identifies discriminative subspaces using Singular Value Decomposition over token embeddings associated with the concepts to forget and retain.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Face recognition and analysis
MethodsDiffusion
