VII: Visual Instruction Injection for Jailbreaking Image-to-Video Generation Models
Bowen Zheng, Yongli Xiang, Ziming Hong, Zerong Lin, Chaojian Yu, Tongliang Liu, Xinge You

TL;DR
This paper reveals a new security risk in image-to-video models where malicious instructions can be covertly embedded in images to generate harmful videos, and proposes a framework to demonstrate and exploit this vulnerability.
Contribution
It introduces Visual Instruction Injection (VII), a training-free method to disguise malicious prompts as benign visual instructions, exposing a significant security flaw in I2V models.
Findings
Achieves up to 83.5% attack success rate on state-of-the-art models.
Reduces refusal rates to near zero, enabling effective malicious content generation.
Outperforms existing baselines in jailbreaking image-to-video models.
Abstract
Image-to-Video (I2V) generation models, which condition video generation on reference images, have shown emerging visual instruction-following capability, allowing certain visual cues in reference images to act as implicit control signals for video generation. However, this capability also introduces a previously overlooked risk: adversaries may exploit visual instructions to inject malicious intent through the image modality. In this work, we uncover this risk by proposing Visual Instruction Injection (VII), a training-free and transferable jailbreaking framework that intentionally disguises the malicious intent of unsafe text prompts as benign visual instructions in the safe reference image. Specifically, VII coordinates a Malicious Intent Reprogramming module to distill malicious intent from unsafe text prompts while minimizing their static harmfulness, and a Visual Instruction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Advanced Malware Detection Techniques
