Decentralized Deterministic Multi-Agent Reinforcement Learning

Antoine Grosnit; Desmond Cai; Laura Wynter

arXiv:2102.09745·cs.LG·February 22, 2021

Decentralized Deterministic Multi-Agent Reinforcement Learning

Antoine Grosnit, Desmond Cai, Laura Wynter

PDF

Open Access

TL;DR

This paper extends decentralized multi-agent reinforcement learning algorithms to deterministic policies in continuous action spaces, providing convergence guarantees and addressing exploration challenges.

Contribution

It introduces a provably-convergent decentralized actor-critic algorithm for deterministic policies in continuous spaces, expanding MARL applicability.

Findings

01

Convergence guarantees for the new algorithm.

02

Effective handling of deterministic policies in MARL.

03

Applicability to high-dimensional action spaces.

Abstract

[Zhang, ICML 2018] provided the first decentralized actor-critic algorithm for multi-agent reinforcement learning (MARL) that offers convergence guarantees. In that work, policies are stochastic and are defined on finite action spaces. We extend those results to offer a provably-convergent decentralized actor-critic algorithm for learning deterministic policies on continuous action spaces. Deterministic policies are important in real-world settings. To handle the lack of exploration inherent in deterministic policies, we consider both off-policy and on-policy settings. We provide the expression of a local deterministic policy gradient, decentralized deterministic actor-critic algorithms and convergence guarantees for linearly-approximated value functions. This work will help enable decentralized MARL in high-dimensional action spaces and pave the way for more widespread use of MARL.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control