Evo-MARL: Co-Evolutionary Multi-Agent Reinforcement Learning for Internalized Safety

Zhenyu Pan; Yiting Zhang; Yutong Zhang; Jianshu Zhang; Haozheng Luo; Yuwei Han; Dennis Wu; Hong-Yu Chen; Philip S. Yu; Manling Li; Han Liu

arXiv:2508.03864·cs.AI·September 9, 2025

Evo-MARL: Co-Evolutionary Multi-Agent Reinforcement Learning for Internalized Safety

Zhenyu Pan, Yiting Zhang, Yutong Zhang, Jianshu Zhang, Haozheng Luo, Yuwei Han, Dennis Wu, Hong-Yu Chen, Philip S. Yu, Manling Li, Han Liu

PDF

TL;DR

Evo-MARL introduces a co-evolutionary multi-agent reinforcement learning framework that internalizes safety, enabling agents to jointly develop defensive capabilities against adversarial threats without external modules.

Contribution

It presents a novel co-evolutionary MARL approach that internalizes safety mechanisms within agents, reducing reliance on external safety modules and enhancing robustness.

Findings

01

Reduces attack success rates by up to 22%.

02

Increases reasoning accuracy by up to 5%.

03

Demonstrates improved safety and utility in multi-agent systems.

Abstract

Multi-agent systems (MAS) built on multimodal large language models exhibit strong collaboration and performance. However, their growing openness and interaction complexity pose serious risks, notably jailbreak and adversarial attacks. Existing defenses typically rely on external guard modules, such as dedicated safety agents, to handle unsafe behaviors. Unfortunately, this paradigm faces two challenges: (1) standalone agents offer limited protection, and (2) their independence leads to single-point failure-if compromised, system-wide safety collapses. Naively increasing the number of guard agents further raises cost and complexity. To address these challenges, we propose Evo-MARL, a novel multi-agent reinforcement learning (MARL) framework that enables all task agents to jointly acquire defensive capabilities. Rather than relying on external safety modules, Evo-MARL trains each agent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.