Towards Resilient Safety-driven Unlearning for Diffusion Models against Downstream Fine-tuning
Boheng Li, Renjie Gu, Junjie Wang, Leyi Qi, Yiming Li, Run Wang, Zhan Qin, Tianwei Zhang

TL;DR
This paper introduces ResAlign, a novel safety-driven unlearning framework for diffusion models that maintains safety and benign performance even after downstream fine-tuning, addressing the fragility of existing methods.
Contribution
ResAlign models downstream fine-tuning as an implicit optimization problem and uses meta-learning to enhance safety unlearning resilience against fine-tuning.
Findings
ResAlign outperforms prior methods in safety retention after fine-tuning.
ResAlign effectively preserves benign image generation capabilities.
Extensive experiments validate ResAlign's robustness across datasets and fine-tuning scenarios.
Abstract
Text-to-image (T2I) diffusion models have achieved impressive image generation quality and are increasingly fine-tuned for personalized applications. However, these models often inherit unsafe behaviors from toxic pretraining data, raising growing safety concerns. While recent safety-driven unlearning methods have made promising progress in suppressing model toxicity, they are found to be fragile to downstream fine-tuning, as we reveal that state-of-the-art methods largely fail to retain their effectiveness even when fine-tuned on entirely benign datasets. To mitigate this problem, in this paper, we propose ResAlign, a safety-driven unlearning framework with enhanced resilience against downstream fine-tuning. By modeling downstream fine-tuning as an implicit optimization problem with a Moreau envelope-based reformulation, ResAlign enables efficient gradient estimation to minimize the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Image Enhancement Techniques
