PeerGuard: Defending Multi-Agent Systems Against Backdoor Attacks Through Mutual Reasoning

Falong Fan; Xi Li

arXiv:2505.11642·cs.MA·May 28, 2025

PeerGuard: Defending Multi-Agent Systems Against Backdoor Attacks Through Mutual Reasoning

Falong Fan, Xi Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces PeerGuard, a defense mechanism for multi-agent systems that uses mutual reasoning to detect backdoor attacks, enhancing safety and trustworthiness in AI interactions.

Contribution

It proposes a novel mutual reasoning-based method to identify poisoned agents in multi-agent systems, addressing a largely unexplored safety challenge.

Findings

01

High accuracy in detecting poisoned agents

02

Effective on LLM-based multi-agent systems like ChatGPT and Llama 3

03

Minimized false positives on clean agents

Abstract

Multi-agent systems leverage advanced AI models as autonomous agents that interact, cooperate, or compete to complete complex tasks across applications such as robotics and traffic management. Despite their growing importance, safety in multi-agent systems remains largely underexplored, with most research focusing on single AI models rather than interacting agents. This work investigates backdoor vulnerabilities in multi-agent systems and proposes a defense mechanism based on agent interactions. By leveraging reasoning abilities, each agent evaluates responses from others to detect illogical reasoning processes, which indicate poisoned agents. Experiments on LLM-based multi-agent systems, including ChatGPT series and Llama 3, demonstrate the effectiveness of the proposed method, achieving high accuracy in identifying poisoned agents while minimizing false positives on clean agents. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leongvan/peerguard
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI

MethodsLLaMA