Discovering Failure Modes of Text-guided Diffusion Models via Adversarial Search
Qihao Liu, Adam Kortylewski, Yutong Bai, Song Bai, and Alan Yuille

TL;DR
This paper introduces SAGE, an adversarial search method that systematically uncovers failure modes in text-guided diffusion models by exploring prompt and latent spaces, revealing issues like semantic inaccuracies and misalignments.
Contribution
We propose SAGE, the first adversarial search technique for TDMs, enabling automatic discovery of failure cases in both prompt and latent spaces, validated through human inspection.
Findings
Identified prompts that produce images with incorrect semantics.
Discovered regions in latent space leading to distorted images.
Found latent samples causing unrelated, natural-looking images.
Abstract
Text-guided diffusion models (TDMs) are widely applied but can fail unexpectedly. Common failures include: (i) natural-looking text prompts generating images with the wrong content, or (ii) different random samples of the latent variables that generate vastly different, and even unrelated, outputs despite being conditioned on the same text prompt. In this work, we aim to study and understand the failure modes of TDMs in more detail. To achieve this, we propose SAGE, the first adversarial search method on TDMs that systematically explores the discrete prompt space and the high-dimensional latent space, to automatically discover undesirable behaviors and failure cases in image generation. We use image classifiers as surrogate loss functions during searching, and employ human inspections to validate the identified failures. For the first time, our method enables efficient exploration of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Topic Modeling · Generative Adversarial Networks and Image Synthesis
Methodsfail · Diffusion · Contrastive Language-Image Pre-training
