Evo-MARL: Co-Evolutionary Multi-Agent Reinforcement Learning for Internalized Safety
Zhenyu Pan, Yiting Zhang, Yutong Zhang, Jianshu Zhang, Haozheng Luo, Yuwei Han, Dennis Wu, Hong-Yu Chen, Philip S. Yu, Manling Li, Han Liu

TL;DR
Evo-MARL introduces a co-evolutionary multi-agent reinforcement learning framework that internalizes safety, enabling agents to jointly develop defensive capabilities against adversarial threats without external modules.
Contribution
It presents a novel co-evolutionary MARL approach that internalizes safety mechanisms within agents, reducing reliance on external safety modules and enhancing robustness.
Findings
Reduces attack success rates by up to 22%.
Increases reasoning accuracy by up to 5%.
Demonstrates improved safety and utility in multi-agent systems.
Abstract
Multi-agent systems (MAS) built on multimodal large language models exhibit strong collaboration and performance. However, their growing openness and interaction complexity pose serious risks, notably jailbreak and adversarial attacks. Existing defenses typically rely on external guard modules, such as dedicated safety agents, to handle unsafe behaviors. Unfortunately, this paradigm faces two challenges: (1) standalone agents offer limited protection, and (2) their independence leads to single-point failure-if compromised, system-wide safety collapses. Naively increasing the number of guard agents further raises cost and complexity. To address these challenges, we propose Evo-MARL, a novel multi-agent reinforcement learning (MARL) framework that enables all task agents to jointly acquire defensive capabilities. Rather than relying on external safety modules, Evo-MARL trains each agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
