BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks

Rui Miao; Yixin Liu; Yili Wang; Xu Shen; Yue Tan; Yiwei Dai; Shirui Pan; Xin Wang

arXiv:2508.08127·cs.AI·April 28, 2026

BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks

Rui Miao, Yixin Liu, Yili Wang, Xu Shen, Yue Tan, Yiwei Dai, Shirui Pan, Xin Wang

PDF

1 Repo

TL;DR

BlindGuard is an unsupervised defense framework for LLM-based multi-agent systems that detects malicious agents without prior attack knowledge, using hierarchical encoding and corruption-guided detection.

Contribution

It introduces a novel unsupervised method combining hierarchical agent encoding and corruption-guided detection to defend MAS against unknown attacks.

Findings

01

Effectively detects diverse attack types including prompt injection and memory poisoning.

02

Outperforms supervised baselines in generalizability across various communication patterns.

03

Maintains high detection accuracy without requiring labeled attack data.

Abstract

The security of LLM-based multi-agent systems (MAS) is critically threatened by propagation vulnerability, where malicious agents can distort collective decision-making through inter-agent message interactions. While existing supervised defense methods demonstrate promising performance, they may be impractical in real-world scenarios due to their heavy reliance on labeled malicious agents to train a supervised malicious detection model. To enable practical and generalizable MAS defenses, in this paper, we propose BlindGuard, an unsupervised defense method that learns without requiring any attack-specific labels or prior knowledge of malicious behaviors. To this end, we establish a hierarchical agent encoder to capture individual, neighborhood, and global interaction patterns of each agent, providing a comprehensive understanding for malicious agent detection. Meanwhile, we design a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MR9812/BlindGuard
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.