Certifiably Robust Policy Learning against Adversarial Communication in Multi-agent Systems
Yanchao Sun, Ruijie Zheng, Parisa Hassanzadeh, Yongyuan Liang, Soheil, Feizi, Sumitra Ganesh, Furong Huang

TL;DR
This paper introduces a certifiably robust policy learning method for multi-agent systems that defends against adversarial communication manipulation, ensuring safety and robustness in noisy or malicious environments.
Contribution
The paper proposes a novel message-ensemble policy that provides certifiable robustness against adversarial communication attacks in multi-agent reinforcement learning.
Findings
Significant improvement in robustness against communication attacks.
Effective in multiple environments with various attack types.
Theoretical guarantees of robustness regardless of attack strategy.
Abstract
Communication is important in many multi-agent reinforcement learning (MARL) problems for agents to share information and make good decisions. However, when deploying trained communicative agents in a real-world application where noise and potential attackers exist, the safety of communication-based policies becomes a severe issue that is underexplored. Specifically, if communication messages are manipulated by malicious attackers, agents relying on untrustworthy communication may take unsafe actions that lead to catastrophic consequences. Therefore, it is crucial to ensure that agents will not be misled by corrupted communication, while still benefiting from benign communication. In this work, we consider an environment with agents, where the attacker may arbitrarily change the communication from any agents to a victim agent. For this strong threat model, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Terrorism, Counterterrorism, and Political Violence · Hate Speech and Cyberbullying Detection
