JPRO: Automated Multimodal Jailbreaking via Multi-Agent Collaboration Framework
Yuxuan Zhou, Yang Bai, Kuofeng Gao, Tao Dai, Shu-Tao Xia

TL;DR
JPRO is an automated multi-agent framework that significantly improves the diversity, scalability, and success rate of jailbreaking large multimodal vision-language models, revealing critical security vulnerabilities.
Contribution
It introduces a novel multi-agent collaborative framework with seed generation and adaptive optimization for effective, diverse, and scalable VLM jailbreaking.
Findings
Achieves over 60% attack success rate on multiple VLMs
Outperforms existing methods in attack diversity and scalability
Uncovers critical security vulnerabilities in multimodal models
Abstract
The widespread application of large VLMs makes ensuring their secure deployment critical. While recent studies have demonstrated jailbreak attacks on VLMs, existing approaches are limited: they require either white-box access, restricting practicality, or rely on manually crafted patterns, leading to poor sample diversity and scalability. To address these gaps, we propose JPRO, a novel multi-agent collaborative framework designed for automated VLM jailbreaking. It effectively overcomes the shortcomings of prior methods in attack diversity and scalability. Through the coordinated action of four specialized agents and its two core modules: Tactic-Driven Seed Generation and Adaptive Optimization Loop, JPRO generates effective and diverse attack samples. Experimental results show that JPRO achieves over a 60\% attack success rate on multiple advanced VLMs, including GPT-4o, significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Security and Verification in Computing
