Visual Exclusivity Attacks: Automatic Multimodal Red Teaming via Agentic Planning
Yunbei Zhang, Yingqiang Ge, Weijie Xu, Yuhui Xu, Jihun Hamm, Chandan K. Reddy

TL;DR
This paper introduces a novel multimodal attack framework called VE and MM-Plan that significantly improves the success rate of visual content-based attacks on advanced AI models, exposing safety vulnerabilities.
Contribution
It proposes a resilient visual threat model and a planning-based attack framework that outperforms existing methods in attacking large multimodal models.
Findings
MM-Plan achieves 46.3% success against Claude 4.5 Sonnet.
It attains 13.8% success against GPT-5.
The approach outperforms baselines by 2-5 times.
Abstract
Current multimodal red teaming treats images as wrappers for malicious payloads via typography or adversarial noise. These attacks are structurally brittle, as standard defenses neutralize them once the payload is exposed. We introduce Visual Exclusivity (VE), a more resilient Image-as-Basis threat where harm emerges only through reasoning over visual content such as technical schematics. To systematically exploit VE, we propose Multimodal Multi-turn Agentic Planning (MM-Plan), a framework that reframes jailbreaking from turn-by-turn reaction to global plan synthesis. MM-Plan trains an attacker planner to synthesize comprehensive, multi-turn strategies, optimized via Group Relative Policy Optimization (GRPO), enabling self-discovery of effective strategies without human supervision. To rigorously benchmark this reasoning-dependent threat, we introduce VE-Safety, a human-curated dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
