Understanding Implosion in Text-to-Image Generative Models
Wenxin Ding, Cathy Y. Li, Shawn Shan, Ben Y. Zhao, Haitao Zheng

TL;DR
This paper introduces an analytical framework to understand how poisoning attacks cause model implosion in text-to-image generative models by analyzing cross-attention mechanisms and quantifying attack difficulty.
Contribution
It models poisoning attacks as a graph alignment problem and provides a formal measure (AD) to predict when models will implode due to data poisoning.
Findings
Higher AD correlates with increased likelihood of model implosion.
Poisoning more concepts increases alignment difficulty and distortion.
The framework explains and predicts model failures under poisoning attacks.
Abstract
Recent works show that text-to-image generative models are surprisingly vulnerable to a variety of poisoning attacks. Empirical results find that these models can be corrupted by altering associations between individual text prompts and associated visual features. Furthermore, a number of concurrent poisoning attacks can induce "model implosion," where the model becomes unable to produce meaningful images for unpoisoned prompts. These intriguing findings highlight the absence of an intuitive framework to understand poisoning attacks on these models. In this work, we establish the first analytical framework on robustness of image generative models to poisoning attacks, by modeling and analyzing the behavior of the cross-attention mechanism in latent diffusion models. We model cross-attention training as an abstract problem of "supervised graph alignment" and formally quantify the impact…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion
