AdvI2I: Adversarial Image Attack on Image-to-Image Diffusion models

Yaopei Zeng; Yuanpu Cao; Bochuan Cao; Yurui Chang; Jinghui Chen; Lu Lin

arXiv:2410.21471·cs.CV·September 15, 2025

AdvI2I: Adversarial Image Attack on Image-to-Image Diffusion models

Yaopei Zeng, Yuanpu Cao, Bochuan Cao, Yurui Chang, Jinghui Chen, Lu Lin

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper reveals a new vulnerability in image-to-image diffusion models where adversarial images can induce NSFW content generation, bypassing existing defenses, and proposes a framework to craft such images.

Contribution

The paper introduces AdvI2I, a novel adversarial image attack framework targeting I2I diffusion models, and enhances it with AdvI2I-Adaptive to improve resilience against defenses.

Findings

01

AdvI2I effectively bypasses current safeguards.

02

AdvI2I-Adaptive reduces resemblance to NSFW embeddings.

03

The attacks demonstrate a significant security risk in I2I diffusion models.

Abstract

Recent advances in diffusion models have significantly enhanced the quality of image synthesis, yet they have also introduced serious safety concerns, particularly the generation of Not Safe for Work (NSFW) content. Previous research has demonstrated that adversarial prompts can be used to generate NSFW content. However, such adversarial text prompts are often easily detectable by text-based filters, limiting their efficacy. In this paper, we expose a previously overlooked vulnerability: adversarial image attacks targeting Image-to-Image (I2I) diffusion models. We propose AdvI2I, a novel framework that manipulates input images to induce diffusion models to generate NSFW content. By optimizing a generator to craft adversarial images, AdvI2I circumvents existing defense mechanisms, such as Safe Latent Diffusion (SLD), without altering the text prompts. Furthermore, we introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Spinozaaa/AdvI2I
pytorchOfficial

Videos

AdvI2I: Adversarial Image Attack on Image-to-Image Diffusion Models· slideslive

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Bacillus and Francisella bacterial research · Cell Image Analysis Techniques

MethodsDiffusion