Universal Image Immunization against Diffusion-based Image Editing via Semantic Injection
Chanhui Lee, Seunghyun Shin, Donggyu Choi, Hae-gon Jeon, Jeany Son

TL;DR
This paper introduces a universal image immunization method that creates a single adversarial perturbation to prevent diffusion-based image editing, enhancing scalability and practicality in defending against malicious semantic manipulations.
Contribution
It presents the first universal immunization framework that generates a broad, effective adversarial perturbation for diffusion models without needing training data or domain knowledge.
Findings
Outperforms baseline methods in universal adversarial perturbation settings.
Achieves comparable results to image-specific methods under limited perturbation budgets.
Demonstrates strong black-box transferability across different diffusion models.
Abstract
Recent advances in diffusion models have enabled powerful image editing capabilities guided by natural language prompts, unlocking new creative possibilities. However, they introduce significant ethical and legal risks, such as deepfakes and unauthorized use of copyrighted visual content. To address these risks, image immunization has emerged as a promising defense against AI-driven semantic manipulation. Yet, most existing approaches rely on image-specific adversarial perturbations that require individual optimization for each image, thereby limiting scalability and practicality. In this paper, we propose the first universal image immunization framework that generates a single, broadly applicable adversarial perturbation specifically designed for diffusion-based editing pipelines. Inspired by universal adversarial perturbation (UAP) techniques used in targeted attacks, our method…
Peer Reviews
Decision·Submitted to ICLR 2026
- Research on anti-editing is meaningful and promising. - The proposed universal adversarial perturbation (UAP) demonstrates greater effectiveness compared to prior per-image optimization approaches. - Experimental results show that the proposed method achieves improved performance in several cases.
- During the editing phase, does the proposed method need to append the target prompt (e.g., “Ronaldo”) to the editing prompt? If so, how can it guarantee that a malicious user would use that specific prompt? If not, how does the UAP maintain robustness across different editing prompts, given that it appears to be trained with a fixed target prompt? - How well does the UAP generalize to complex or lengthy editing prompts? Does its effectiveness degrade under more complicated prompt conditions?
- The paper proposes a universal, data-free image immunization framework that generalizes across diffusion models. - The method introduces a simple yet effective dual-loss design to achieve semantic-level defense. - The approach demonstrates strong transferability and robustness under both white-box and black-box settings. - The experiments cover multiple diffusion models and editing scenarios, showing consistent performance.
- The paper presents an empirical approach with limited theoretical justification. - The authors do not provide a thorough discussion on why $\mathcal{L}_\text{inj}$ is effective in the cross-attention feature space or its theoretical justification, relying instead primarily on empirical validation. - The evaluation relies heavily on pixel and perceptual similarity metrics, despite the method's core focus on semantic injection and suppression; adding CLIPScore or Grounding DINO detection would b
1. The proposed method enables universal protection using a single perturbation, making it significantly more practical and scalable compared to image-specific perturbations. 2. The paper is clearly written and easy to follow, with well-structured methodology and presentation. 3. The approach demonstrates broad applicability across diverse editing models, including Stable Diffusion v1.4 and v2.0, InstructPix2Pix, DiT, and inpainting pipelines.
1. While the method aims to inject target semantics, it is unclear whether the perturbation truly captures the intended concept. For instance, in the *cow* example of Figure 3 (Ours), the generated image still depicts a cow. Also, the perturbation appears to preserve only the **structure** of the *Ronaldo* target image, rather than semantic attributes like gender or identity. 2. The authors claim that text semantics are naturally fused into visual features at the cross-attention output level. Ho
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection
