Distributed Neural Policy Gradient Algorithm for Global Convergence of Networked Multi-Agent Reinforcement Learning
Pengcheng Dai, Yuanqiu Mo, Wenwu Yu, and Wei Ren

TL;DR
This paper introduces a distributed neural policy gradient algorithm for multi-agent reinforcement learning that ensures global convergence and improves collaborative policy evaluation using neural networks.
Contribution
The paper proposes a novel distributed neural policy gradient method with two neural networks for Q-functions and policies, ensuring global convergence in multi-agent settings.
Findings
Proves global convergence of the proposed algorithm.
Demonstrates effectiveness through simulation in robot path planning.
Outperforms centralized algorithms in collaborative tasks.
Abstract
This paper studies the networked multi-agent reinforcement learning (NMARL) problem, where the objective of agents is to collaboratively maximize the discounted average cumulative rewards. Different from the existing methods that suffer from poor expression due to linear function approximation, we propose a distributed neural policy gradient algorithm that features two innovatively designed neural networks, specifically for the approximate Q-functions and policy functions of agents. This distributed neural policy gradient algorithm consists of two key components: the distributed critic step and the decentralized actor step. In the distributed critic step, agents receive the approximate Q-function parameters from their neighboring agents via a time-varying communication networks to collaboratively evaluate the joint policy. In contrast, in the decentralized actor step, each agent updates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Sensor and Control Systems
