TreeTeaming: Autonomous Red-Teaming of Vision-Language Models via Hierarchical Strategy Exploration
Chunxiao Li, Lijun Li, Jing Shao

TL;DR
TreeTeaming is an innovative framework that uses hierarchical strategy exploration driven by LLMs to autonomously discover diverse and effective vulnerabilities in vision-language models, surpassing existing red teaming methods.
Contribution
It introduces a dynamic, evolutionary approach to red teaming that constructs a strategy tree for more comprehensive vulnerability discovery in VLMs.
Findings
Achieves state-of-the-art attack success rates on 11 out of 12 VLMs.
Demonstrates superior strategic diversity compared to previous jailbreak strategies.
Reduces attack toxicity by an average of 23.09%, enhancing stealthiness.
Abstract
The rapid advancement of Vision-Language Models (VLMs) has brought their safety vulnerabilities into sharp focus. However, existing red teaming methods are fundamentally constrained by an inherent linear exploration paradigm, confining them to optimizing within a predefined strategy set and preventing the discovery of novel, diverse exploits. To transcend this limitation, we introduce TreeTeaming, an automated red teaming framework that reframes strategy exploration from static testing to a dynamic, evolutionary discovery process. At its core lies a strategic Orchestrator, powered by a Large Language Model (LLM), which autonomously decides whether to evolve promising attack paths or explore diverse strategic branches, thereby dynamically constructing and expanding a strategy tree. A multimodal actuator is then tasked with executing these complex strategies. In the experiments across 12…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Advanced Malware Detection Techniques
