AdvAnchor: Enhancing Diffusion Model Unlearning with Adversarial Anchors

Mengnan Zhao; Lihe Zhang; Xingyi Yang; Tianhang Zheng; Baocai Yin

arXiv:2501.00054·cs.LG·January 3, 2025

AdvAnchor: Enhancing Diffusion Model Unlearning with Adversarial Anchors

Mengnan Zhao, Lihe Zhang, Xingyi Yang, Tianhang Zheng, Baocai Yin

PDF

Open Access

TL;DR

AdvAnchor introduces adversarial anchors to improve unlearning of unsafe concepts in diffusion models, balancing concept removal and preservation more effectively than previous methods.

Contribution

The paper proposes AdvAnchor, a novel adversarial anchor technique that enhances unlearning performance by selectively excluding undesirable concepts while maintaining overall model quality.

Findings

01

AdvAnchor outperforms existing unlearning methods in experiments.

02

Adversarial anchors effectively balance concept removal and preservation.

03

The approach maintains high model performance after unlearning.

Abstract

Security concerns surrounding text-to-image diffusion models have driven researchers to unlearn inappropriate concepts through fine-tuning. Recent fine-tuning methods typically align the prediction distributions of unsafe prompts with those of predefined text anchors. However, these techniques exhibit a considerable performance trade-off between eliminating undesirable concepts and preserving other concepts. In this paper, we systematically analyze the impact of diverse text anchors on unlearning performance. Guided by this analysis, we propose AdvAnchor, a novel approach that generates adversarial anchors to alleviate the trade-off issue. These adversarial anchors are crafted to closely resemble the embeddings of undesirable concepts to maintain overall model performance, while selectively excluding defining attributes of these concepts for effective erasure. Extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications

MethodsDiffusion · ALIGN