MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization

Eshagh Kargar; Ville Kyrki

arXiv:2109.00882·cs.LG·September 3, 2021

MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization

Eshagh Kargar, Ville Kyrki

PDF

Open Access 1 Repo

TL;DR

MACRPO introduces a novel multi-agent reinforcement learning method that enhances cooperation and information sharing through recurrent networks and a new advantage function, outperforming existing algorithms in complex environments.

Contribution

This paper presents MACRPO, a new multi-agent actor-critic algorithm with recurrent layers and a novel advantage function, improving cooperation in partially observable, non-stationary settings.

Findings

01

MACRPO outperforms state-of-the-art algorithms like QMIX and MADDPG.

02

Recurrent critic with meta-trajectory training improves cooperation.

03

Incorporating other agents' rewards enhances learning efficiency.

Abstract

This work considers the problem of learning cooperative policies in multi-agent settings with partially observable and non-stationary environments without a communication channel. We focus on improving information sharing between agents and propose a new multi-agent actor-critic method called \textit{Multi-Agent Cooperative Recurrent Proximal Policy Optimization} (MACRPO). We propose two novel ways of integrating information across agents and time in MACRPO: First, we use a recurrent layer in critic's network architecture and propose a new framework to use a meta-trajectory to train the recurrent layer. This allows the network to learn the cooperation and dynamics of interactions between agents, and also handle partial observability. Second, we propose a new advantage function that incorporates other agents' rewards and value functions. We evaluate our algorithm on three challenging…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kargarisaac/macrpo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and ELM · Data Stream Mining Techniques

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Tanh Activation · Sigmoid Activation · Residual Connection · Long Short-Term Memory · Max Pooling · V-trace · RMSProp · Gradient Clipping · Entropy Regularization