T2V-OptJail: Discrete Prompt Optimization for Text-to-Video Jailbreak Attacks
Jiayang Liu, Siyuan Liang, Shiqian Zhao, Rongcheng Tu, Wenbo Zhou, Aishan Liu, Dacheng Tao, Siew Kei Lam

TL;DR
This paper introduces T2V-OptJail, a novel discrete prompt optimization framework for text-to-video jailbreak attacks that systematically explores vulnerabilities and improves attack success rates while maintaining content control.
Contribution
It formalizes T2V jailbreak as a discrete optimization problem and proposes an iterative joint objective-based framework for more effective and controllable attacks.
Findings
Improves attack success rate by over 10% compared to previous methods.
Effectively balances attack success with content preservation.
Validated on multiple open-source and commercial T2V models.
Abstract
In recent years, fueled by the rapid advancement of diffusion models, text-to-video (T2V) generation models have achieved remarkable progress, with notable examples including Pika, Luma, Kling, and Open-Sora. Although these models exhibit impressive generative capabilities, they also expose significant security risks due to their vulnerability to jailbreak attacks, where the models are manipulated to produce unsafe content such as pornography, violence, or discrimination. Existing works such as T2VSafetyBench provide preliminary benchmarks for safety evaluation, but lack systematic methods for thoroughly exploring model vulnerabilities. To address this gap, we are the first to formalize the T2V jailbreak attack as a discrete optimization problem and propose a joint objective-based optimization framework, called T2V-OptJail. This framework consists of two key optimization goals:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Internet Traffic Analysis and Secure E-voting
MethodsDiffusion
