MAD-Spear: A Conformity-Driven Prompt Injection Attack on Multi-Agent Debate Systems
Yu Cui, Hongyang Du

TL;DR
This paper introduces MAD-Spear, a prompt injection attack targeting multi-agent debate systems, revealing their vulnerability to misinformation propagation and emphasizing the need for enhanced security measures.
Contribution
We propose MAD-Spear, a novel attack method that significantly disrupts MAD systems, and establish a formal framework to evaluate their fault-tolerance against such attacks.
Findings
MAD-Spear outperforms baseline attacks in degrading MAD performance
Agent diversity can increase vulnerability in mathematical reasoning tasks
MAD systems are susceptible to misinformation propagation under attack
Abstract
Multi-agent debate (MAD) systems leverage collaborative interactions among large language models (LLMs) agents to improve reasoning capabilities. While recent studies have focused on increasing the accuracy and scalability of MAD systems, their security vulnerabilities have received limited attention. In this work, we introduce MAD-Spear, a targeted prompt injection attack that compromises a small subset of agents but significantly disrupts the overall MAD process. Manipulated agents produce multiple plausible yet incorrect responses, exploiting LLMs' conformity tendencies to propagate misinformation and degrade consensus quality. Furthermore, the attack can be composed with other strategies, such as communication attacks, to further amplify its impact by increasing the exposure of agents to incorrect responses. To assess MAD's resilience under attack, we propose a formal definition of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Advanced Malware Detection Techniques · Multi-Agent Systems and Negotiation
