Demonstrations of Integrity Attacks in Multi-Agent Systems
Can Zheng, Yuhan Cao, Xiaoning Dong, Tianxing He

TL;DR
This paper investigates how malicious agents can subtly manipulate multi-agent systems through prompt attacks, revealing vulnerabilities in current detection methods and emphasizing the need for more secure MAS architectures.
Contribution
The study introduces four novel prompt-based attack strategies against multi-agent systems and demonstrates their effectiveness in misleading evaluations and bypassing advanced monitors.
Findings
Strategic prompt manipulation can bias MAS behavior systematically.
Current LLM-based monitors can be bypassed by crafted attacks.
Malicious agents can manipulate evaluations and collaboration in MAS.
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding, code generation, and complex planning. Simultaneously, Multi-Agent Systems (MAS) have garnered attention for their potential to enable cooperation among distributed agents. However, from a multi-party perspective, MAS could be vulnerable to malicious agents that exploit the system to serve self-interests without disrupting its core functionality. This work explores integrity attacks where malicious agents employ subtle prompt manipulation to bias MAS operations and gain various benefits. Four types of attacks are examined: \textit{Scapegoater}, who misleads the system monitor to underestimate other agents' contributions; \textit{Boaster}, who misleads the system monitor to overestimate their own performance; \textit{Self-Dealer}, who manipulates other agents to adopt certain tools;…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Explainable Artificial Intelligence (XAI)
MethodsSoftmax · Attention Is All You Need · ADaptive gradient method with the OPTimal convergence rate · Mixing Adam and SGD
