Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Ryan Lowe; Yi Wu; Aviv Tamar; Jean Harb; Pieter Abbeel; Igor Mordatch

arXiv:1706.02275·cs.LG·March 17, 2020·1.0k cites

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, Igor Mordatch

PDF

Open Access 5 Repos 4 Models

TL;DR

This paper introduces an adapted actor-critic method for multi-agent reinforcement learning that handles environment non-stationarity and scales with multiple agents, enabling complex coordination in cooperative and competitive settings.

Contribution

It proposes a novel multi-agent actor-critic algorithm considering other agents' policies and a robust training regimen with policy ensembles, advancing multi-agent learning capabilities.

Findings

01

Outperforms existing methods in cooperative scenarios

02

Effective in competitive environments with diverse strategies

03

Enables learning of complex multi-agent coordination behaviors

Abstract

We explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inherent non-stationarity of the environment, while policy gradient suffers from a variance that increases as the number of agents grows. We then present an adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination. Additionally, we introduce a training regimen utilizing an ensemble of policies for each agent that leads to more robust multi-agent policies. We show the strength of our approach compared to existing methods in cooperative as well as competitive scenarios, where agent populations are able to discover various physical and informational coordination strategies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Mobile Crowdsensing and Crowdsourcing

MethodsExperience Replay · Dense Connections · Weight Decay · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Convolution · Batch Normalization · MADDPG · Q-Learning