MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization
Eshagh Kargar, Ville Kyrki

TL;DR
MACRPO introduces a novel multi-agent reinforcement learning method that enhances cooperation and information sharing through recurrent networks and a new advantage function, outperforming existing algorithms in complex environments.
Contribution
This paper presents MACRPO, a new multi-agent actor-critic algorithm with recurrent layers and a novel advantage function, improving cooperation in partially observable, non-stationary settings.
Findings
MACRPO outperforms state-of-the-art algorithms like QMIX and MADDPG.
Recurrent critic with meta-trajectory training improves cooperation.
Incorporating other agents' rewards enhances learning efficiency.
Abstract
This work considers the problem of learning cooperative policies in multi-agent settings with partially observable and non-stationary environments without a communication channel. We focus on improving information sharing between agents and propose a new multi-agent actor-critic method called \textit{Multi-Agent Cooperative Recurrent Proximal Policy Optimization} (MACRPO). We propose two novel ways of integrating information across agents and time in MACRPO: First, we use a recurrent layer in critic's network architecture and propose a new framework to use a meta-trajectory to train the recurrent layer. This allows the network to learn the cooperation and dynamics of interactions between agents, and also handle partial observability. Second, we propose a new advantage function that incorporates other agents' rewards and value functions. We evaluate our algorithm on three challenging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and ELM · Data Stream Mining Techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Tanh Activation · Sigmoid Activation · Residual Connection · Long Short-Term Memory · Max Pooling · V-trace · RMSProp · Gradient Clipping · Entropy Regularization
