Towards Resilient Safety-driven Unlearning for Diffusion Models against Downstream Fine-tuning

Boheng Li; Renjie Gu; Junjie Wang; Leyi Qi; Yiming Li; Run Wang; Zhan Qin; Tianwei Zhang

arXiv:2507.16302·cs.LG·December 9, 2025

Towards Resilient Safety-driven Unlearning for Diffusion Models against Downstream Fine-tuning

Boheng Li, Renjie Gu, Junjie Wang, Leyi Qi, Yiming Li, Run Wang, Zhan Qin, Tianwei Zhang

PDF

Open Access

TL;DR

This paper introduces ResAlign, a novel safety-driven unlearning framework for diffusion models that maintains safety and benign performance even after downstream fine-tuning, addressing the fragility of existing methods.

Contribution

ResAlign models downstream fine-tuning as an implicit optimization problem and uses meta-learning to enhance safety unlearning resilience against fine-tuning.

Findings

01

ResAlign outperforms prior methods in safety retention after fine-tuning.

02

ResAlign effectively preserves benign image generation capabilities.

03

Extensive experiments validate ResAlign's robustness across datasets and fine-tuning scenarios.

Abstract

Text-to-image (T2I) diffusion models have achieved impressive image generation quality and are increasingly fine-tuned for personalized applications. However, these models often inherit unsafe behaviors from toxic pretraining data, raising growing safety concerns. While recent safety-driven unlearning methods have made promising progress in suppressing model toxicity, they are found to be fragile to downstream fine-tuning, as we reveal that state-of-the-art methods largely fail to retain their effectiveness even when fine-tuned on entirely benign datasets. To mitigate this problem, in this paper, we propose ResAlign, a safety-driven unlearning framework with enhanced resilience against downstream fine-tuning. By modeling downstream fine-tuning as an implicit optimization problem with a Moreau envelope-based reformulation, ResAlign enables efficient gradient estimation to minimize the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Image Enhancement Techniques