Many Agent Reinforcement Learning Under Partial Observability

Keyang He; Prashant Doshi; Bikramjit Banerjee

arXiv:2106.09825·cs.LG·June 21, 2021·1 cites

Many Agent Reinforcement Learning Under Partial Observability

Keyang He, Prashant Doshi, Bikramjit Banerjee

PDF

Open Access

TL;DR

This paper addresses the scalability challenge in multi-agent reinforcement learning under partial observability by applying action anonymity to improve learning efficiency and effectiveness across broader agent network classes.

Contribution

It introduces the application of action anonymity to MADDPG and IA2C algorithms, enhancing their scalability and performance compared to mean-field MARL.

Findings

01

Instantiations learn optimal behavior in broader agent networks.

02

Action anonymity improves scalability of deep MARL algorithms.

03

Outperforms mean-field MARL in tested domains.

Abstract

Recent renewed interest in multi-agent reinforcement learning (MARL) has generated an impressive array of techniques that leverage deep reinforcement learning, primarily actor-critic architectures, and can be applied to a limited range of settings in terms of observability and communication. However, a continuing limitation of much of this work is the curse of dimensionality when it comes to representations based on joint actions, which grow exponentially with the number of agents. In this paper, we squarely focus on this challenge of scalability. We apply the key insight of action anonymity, which leads to permutation invariance of joint actions, to two recently presented deep MARL algorithms, MADDPG and IA2C, and compare these instantiations to another recent technique that leverages action anonymity, viz., mean-field MARL. We show that our instantiations can learn the optimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Adaptive Dynamic Programming Control

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Experience Replay · Dense Connections · Adam · Weight Decay · Convolution · Batch Normalization · MADDPG