Distributed Policy Gradient with Variance Reduction in Multi-Agent Reinforcement Learning
Xiaoxiao Zhao, Jinlong Lei, Li Li, Jie Chen

TL;DR
This paper introduces a distributed policy gradient method with variance reduction and gradient tracking for multi-agent reinforcement learning, addressing high variance and distribution shift issues to improve convergence and efficiency.
Contribution
It proposes a novel distributed policy gradient algorithm that incorporates variance reduction, gradient tracking, and importance weights, specifically designed for non-concave MARL problems.
Findings
Provides an upper bound on the mean-squared stationary gap.
Establishes sample and communication complexity for convergence.
Numerical experiments validate the algorithm's effectiveness.
Abstract
This paper studies a distributed policy gradient in collaborative multi-agent reinforcement learning (MARL), where agents over a communication network aim to find the optimal policy to maximize the average of all agents' local returns. Due to the non-concave performance function of policy gradient, the existing distributed stochastic optimization methods for convex problems cannot be directly used for policy gradient in MARL. This paper proposes a distributed policy gradient with variance reduction and gradient tracking to address the high variances of policy gradient, and utilizes importance weight to solve the {distribution shift} problem in the sampling process. We then provide an upper bound on the mean-squared stationary gap, which depends on the number of iterations, the mini-batch size, the epoch size, the problem parameters, and the network topology. We further establish the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed Control Multi-Agent Systems · Reinforcement Learning in Robotics · Advanced MIMO Systems Optimization
