Parameter Sharing Deep Deterministic Policy Gradient for Cooperative   Multi-agent Reinforcement Learning

Xiangxiang Chu; Hangjun Ye

arXiv:1710.00336·cs.AI·October 4, 2017·54 cites

Parameter Sharing Deep Deterministic Policy Gradient for Cooperative Multi-agent Reinforcement Learning

Xiangxiang Chu, Hangjun Ye

PDF

Open Access

TL;DR

This paper introduces a parameter sharing approach for multi-agent deep deterministic policy gradient methods, significantly improving scalability, learning speed, and memory efficiency in cooperative multi-agent reinforcement learning tasks.

Contribution

It proposes a novel parameter sharing deterministic policy gradient method with three variants, enhancing scalability and efficiency over existing multi-agent DRL approaches.

Findings

01

Outperforms existing methods in multi-agent games

02

Scales well with increasing number of agents

03

Improves learning speed and memory efficiency

Abstract

Deep reinforcement learning for multi-agent cooperation and competition has been a hot topic recently. This paper focuses on cooperative multi-agent problem based on actor-critic methods under local observations settings. Multi agent deep deterministic policy gradient obtained state of art results for some multi-agent games, whereas, it cannot scale well with growing amount of agents. In order to boost scalability, we propose a parameter sharing deterministic policy gradient method with three variants based on neural networks, including actor-critic sharing, actor sharing and actor sharing with partially shared critic. Benchmarks from rllab show that the proposed method has advantages in learning speed and memory efficiency, well scales with growing amount of agents, and moreover, it can make full use of reward sharing and exchangeability if possible.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Autonomous Vehicle Technology and Safety

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings