AdvI2I: Adversarial Image Attack on Image-to-Image Diffusion models
Yaopei Zeng, Yuanpu Cao, Bochuan Cao, Yurui Chang, Jinghui Chen, Lu Lin

TL;DR
This paper reveals a new vulnerability in image-to-image diffusion models where adversarial images can induce NSFW content generation, bypassing existing defenses, and proposes a framework to craft such images.
Contribution
The paper introduces AdvI2I, a novel adversarial image attack framework targeting I2I diffusion models, and enhances it with AdvI2I-Adaptive to improve resilience against defenses.
Findings
AdvI2I effectively bypasses current safeguards.
AdvI2I-Adaptive reduces resemblance to NSFW embeddings.
The attacks demonstrate a significant security risk in I2I diffusion models.
Abstract
Recent advances in diffusion models have significantly enhanced the quality of image synthesis, yet they have also introduced serious safety concerns, particularly the generation of Not Safe for Work (NSFW) content. Previous research has demonstrated that adversarial prompts can be used to generate NSFW content. However, such adversarial text prompts are often easily detectable by text-based filters, limiting their efficacy. In this paper, we expose a previously overlooked vulnerability: adversarial image attacks targeting Image-to-Image (I2I) diffusion models. We propose AdvI2I, a novel framework that manipulates input images to induce diffusion models to generate NSFW content. By optimizing a generator to craft adversarial images, AdvI2I circumvents existing defense mechanisms, such as Safe Latent Diffusion (SLD), without altering the text prompts. Furthermore, we introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Bacillus and Francisella bacterial research · Cell Image Analysis Techniques
MethodsDiffusion
