Byzantine Robust Cooperative Multi-Agent Reinforcement Learning as a Bayesian Game
Simin Li, Jun Guo, Jingqiao Xiu, Ruixiao Xu, Xin Yu, Jiakai Wang,, Aishan Liu, Yaodong Yang, Xianglong Liu

TL;DR
This paper introduces a Bayesian framework for robust cooperative multi-agent reinforcement learning that effectively handles Byzantine failures and adversarial attacks, ensuring resilient collaboration and decision-making.
Contribution
It proposes the BARDec-POMDP framework and a two-timescale actor-critic algorithm to achieve ex post robust equilibrium with proven convergence, advancing robustness in MARL.
Findings
Successful in matrix games, foraging, and StarCraft II under worst-case attacks.
Achieves resilient micromanagement skills and adaptive ally alignment.
Outperforms previous robust MARL approaches in adversarial settings.
Abstract
In this study, we explore the robustness of cooperative multi-agent reinforcement learning (c-MARL) against Byzantine failures, where any agent can enact arbitrary, worst-case actions due to malfunction or adversarial attack. To address the uncertainty that any agent can be adversarial, we propose a Bayesian Adversarial Robust Dec-POMDP (BARDec-POMDP) framework, which views Byzantine adversaries as nature-dictated types, represented by a separate transition. This allows agents to learn policies grounded on their posterior beliefs about the type of other agents, fostering collaboration with identified allies and minimizing vulnerability to adversarial manipulation. We define the optimal solution to the BARDec-POMDP as an ex post robust Bayesian Markov perfect equilibrium, which we proof to exist and weakly dominates the equilibrium of previous robust MARL approaches. To realize this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics
