TL;DR
This paper introduces a deep neural actor-critic method for decentralized multi-agent reinforcement learning, providing the first finite-time global convergence guarantee in this setting.
Contribution
It develops the first deep neural actor-critic algorithm for decentralized MARL with proven global optimality and finite-time convergence rate.
Findings
Achieves a finite-time convergence rate of O(1/T).
Provides the first global convergence guarantee for deep neural actor-critic in MARL.
Numerical experiments confirm theoretical results.
Abstract
Actor-critic methods for decentralized multi-agent reinforcement learning (MARL) facilitate collaborative optimal decision making without centralized coordination, thus enabling a wide range of applications in practice. To date, however, most theoretical convergence studies for existing actor-critic decentralized MARL methods are limited to the guarantee of a stationary solution under the linear function approximation. This leaves a significant gap between the highly successful use of deep neural actor-critic for decentralized MARL in practice and the current theoretical understanding. To bridge this gap, in this paper, we make the first attempt to develop a deep neural actor-critic method for decentralized MARL, where both the actor and critic components are inherently non-linear. We show that our proposed method enjoys a global optimality guarantee with a finite-time convergence rate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
