MA2CL:Masked Attentive Contrastive Learning for Multi-Agent Reinforcement Learning
Haolin Song, Mingxiao Feng, Wengang Zhou, Houqiang Li

TL;DR
MA2CL introduces a contrastive learning framework that enhances multi-agent reinforcement learning by reconstructing masked agent observations, leading to improved cooperation, performance, and sample efficiency in vision-based scenarios.
Contribution
The paper proposes MA2CL, a novel masked attentive contrastive learning method that incorporates agent-level information for better representation learning in MARL.
Findings
Significantly improves MARL performance in various scenarios.
Enhances sample efficiency of reinforcement learning algorithms.
Outperforms existing methods in vision-based and state-based tasks.
Abstract
Recent approaches have utilized self-supervised auxiliary tasks as representation learning to improve the performance and sample efficiency of vision-based reinforcement learning algorithms in single-agent settings. However, in multi-agent reinforcement learning (MARL), these techniques face challenges because each agent only receives partial observation from an environment influenced by others, resulting in correlated observations in the agent dimension. So it is necessary to consider agent-level information in representation learning for MARL. In this paper, we propose an effective framework called \textbf{M}ulti-\textbf{A}gent \textbf{M}asked \textbf{A}ttentive \textbf{C}ontrastive \textbf{L}earning (MA2CL), which encourages learning representation to be both temporal and agent-level predictive by reconstructing the masked agent observation in latent space. Specifically, we use an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Adaptive Dynamic Programming Control
