Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning

Hao Zhou; Tiru Wu; Yan Jiang; Wanqi Zhou; Junxing Hu; Ai Han

arXiv:2605.13213·cs.AI·May 15, 2026

Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning

Hao Zhou, Tiru Wu, Yan Jiang, Wanqi Zhou, Junxing Hu, Ai Han

PDF

TL;DR

This paper introduces HAM$^{3}$, a hierarchical attack framework targeting multi-modal multi-agent systems across perception, communication, and reasoning layers, revealing vulnerabilities and guiding robustness improvements.

Contribution

The paper presents a novel hierarchical attack framework, HAM$^{3}$, specifically designed for multi-modal multi-agent systems, addressing a gap in adversarial vulnerability research.

Findings

01

HAM$^{3}$ achieves up to 78.3% attack success rate.

02

Reasoning-layer attacks are the most effective.

03

Over half of successful attacks cause multiple agents to make consistent errors.

Abstract

Multi-modal multi-agent systems (MM-MAS) have gained increasing attention for their capacity to enable complex reasoning and coordination across diverse modalities. As these systems continue to expand in scale and functionality, investigating their potential vulnerabilities has become increasingly important. However, existing studies on adversarial attacks in multi-agent systems primarily focus on isolated agents or unimodal settings, leaving the vulnerabilities of MM-MAS largely underexplored. To bridge this gap, we introduce HAM $^{3}$ , a Hierarchical Attack framework for multi-modal multi-agent systems that decomposes attacks into three interconnected layers. Specifically, at the perception layer, HAM $^{3}$ mounts attacks by perturbing visual inputs, textual inputs, and their fused visual-textual representations. At the communication layer, it performs communication-level attacks that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.