Catching the Infection Before It Spreads: Foresight-Guided Defense in Multi-Agent Systems
Yue Ma, Ziyuan Yang, Yi Zhang

TL;DR
This paper introduces a foresight-guided local purification framework for multi-agent systems to effectively detect and eliminate infections, significantly reducing infection spread while preserving interaction diversity.
Contribution
It proposes a training-free, localized infection detection and purification method using future interaction reasoning and multi-persona simulation in multi-agent systems.
Findings
FLP reduces maximum infection rate from over 95% to below 5.47%.
Retrieval and semantic metrics closely match benign baselines.
Effective preservation of interaction diversity despite infection control.
Abstract
Large multimodal model-based Multi-Agent Systems (MASs) enable collaborative complex problem solving through specialized agents. However, MASs are vulnerable to infectious jailbreak, where compromising a single agent can spread to others, leading to widespread compromise. Existing defenses counter this by training a more contagious cure factor, biasing agents to retrieve it over virus adversarial examples (VirAEs). However, this homogenizes agent responses, providing only superficial suppression rather than true recovery. We revisit these defenses, which operate globally via a shared cure factor, while infectious jailbreak arise from localized interaction behaviors. This mismatch limits their effectiveness. To address this, we propose a training-free Foresight-Guided Local Purification (FLP) framework, where each agent reasons over future interactions to track behavioral evolution and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
