TL;DR
This paper analyzes vulnerabilities in personalized diffusion models, revealing how adversarial perturbations cause latent misalignment and proposing a framework to improve protection against such attacks.
Contribution
It uncovers shortcut learning vulnerabilities in PDMs and introduces a systematic red-teaming framework with data purification and contrastive decoupling learning.
Findings
Adversarial perturbations induce latent-space misalignment in PDMs.
The proposed framework outperforms existing purification methods.
The approach enhances robustness against adaptive perturbations.
Abstract
Personalized diffusion models (PDMs) have become prominent for adapting pre-trained text-to-image models to generate images of specific subjects using minimal training data. However, PDMs are susceptible to minor adversarial perturbations, leading to significant degradation when fine-tuned on corrupted datasets. These vulnerabilities are exploited to create protective perturbations that prevent unauthorized image generation. Existing purification methods attempt to red-team the protective perturbation to break the protection but often over-purify images, resulting in information loss. In this work, we conduct an in-depth analysis of the fine-tuning process of PDMs through the lens of shortcut learning. We hypothesize and empirically demonstrate that adversarial perturbations induce a latent-space misalignment between images and their text prompts in the CLIP embedding space. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Machine Learning and Algorithms
MethodsContrastive Language-Image Pre-training · Diffusion · Contrastive Learning
