A Theory of Mind Approach as Test-Time Mitigation Against Emergent Adversarial Communication
Nancirose Piazza, Vahid Behzadan

TL;DR
This paper introduces a Theory-of-Mind based method to detect and mitigate adversarial communication in cooperative multi-agent systems during testing, enhancing robustness against sabotage.
Contribution
It proposes a novel ToM-based approach for test-time defense against adversarial communication in CoMARL, demonstrating its effectiveness in empirical evaluations.
Findings
Effective detection of adversarial messages using ToM-based techniques
Improved cooperative performance in multi-agent systems
Feasibility demonstrated in benchmark environments
Abstract
Multi-Agent Systems (MAS) is the study of multi-agent interactions in a shared environment. Communication for cooperation is a fundamental construct for sharing information in partially observable environments. Cooperative Multi-Agent Reinforcement Learning (CoMARL) is a learning framework where we learn agent policies either with cooperative mechanisms or policies that exhibit cooperative behavior. Explicitly, there are works on learning to communicate messages from CoMARL agents; however, non-cooperative agents, when capable of access a cooperative team's communication channel, have been shown to learn adversarial communication messages, sabotaging the cooperative team's performance particularly when objectives depend on finite resources. To address this issue, we propose a technique which leverages local formulations of Theory-of-Mind (ToM) to distinguish exhibited cooperative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Game Theory and Cooperation · Psychology of Moral and Emotional Judgment
