Distributed Neural Policy Gradient Algorithm for Global Convergence of Networked Multi-Agent Reinforcement Learning

Pengcheng Dai; Yuanqiu Mo; Wenwu Yu; and Wei Ren

arXiv:2505.24113·cs.MA·June 2, 2025

Distributed Neural Policy Gradient Algorithm for Global Convergence of Networked Multi-Agent Reinforcement Learning

Pengcheng Dai, Yuanqiu Mo, Wenwu Yu, and Wei Ren

PDF

Open Access

TL;DR

This paper introduces a distributed neural policy gradient algorithm for multi-agent reinforcement learning that ensures global convergence and improves collaborative policy evaluation using neural networks.

Contribution

The paper proposes a novel distributed neural policy gradient method with two neural networks for Q-functions and policies, ensuring global convergence in multi-agent settings.

Findings

01

Proves global convergence of the proposed algorithm.

02

Demonstrates effectiveness through simulation in robot path planning.

03

Outperforms centralized algorithms in collaborative tasks.

Abstract

This paper studies the networked multi-agent reinforcement learning (NMARL) problem, where the objective of agents is to collaboratively maximize the discounted average cumulative rewards. Different from the existing methods that suffer from poor expression due to linear function approximation, we propose a distributed neural policy gradient algorithm that features two innovatively designed neural networks, specifically for the approximate Q-functions and policy functions of agents. This distributed neural policy gradient algorithm consists of two key components: the distributed critic step and the decentralized actor step. In the distributed critic step, agents receive the approximate Q-function parameters from their neighboring agents via a time-varying communication networks to collaboratively evaluate the joint policy. In contrast, in the decentralized actor step, each agent updates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Sensor and Control Systems