MDPGT: Momentum-based Decentralized Policy Gradient Tracking

Zhanhong Jiang; Xian Yeow Lee; Sin Yong Tan; Kai Liang Tan; Aditya; Balu; Young M. Lee; Chinmay Hegde; Soumik Sarkar

arXiv:2112.02813·cs.LG·December 7, 2021

MDPGT: Momentum-based Decentralized Policy Gradient Tracking

Zhanhong Jiang, Xian Yeow Lee, Sin Yong Tan, Kai Liang Tan, Aditya, Balu, Young M. Lee, Chinmay Hegde, Soumik Sarkar

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces MDPGT, a novel momentum-based decentralized policy gradient method that improves sample efficiency and convergence speed in multi-agent reinforcement learning, validated through theoretical analysis and empirical experiments.

Contribution

The paper presents a new variance reduction technique and a tracking mechanism for decentralized policy gradients, achieving optimal sample complexity and linear speedup in multi-agent RL.

Findings

01

Achieves the best known sample complexity of O(N^{-1}ε^{-3}) for convergence.

02

Validates theoretical claims with experiments on multi-agent RL benchmarks.

03

Demonstrates linear speedup when the error tolerance is small.

Abstract

We propose a novel policy gradient method for multi-agent reinforcement learning, which leverages two different variance-reduction techniques and does not require large batches over iterations. Specifically, we propose a momentum-based decentralized policy gradient tracking (MDPGT) where a new momentum-based variance reduction technique is used to approximate the local policy gradient surrogate with importance sampling, and an intermediate parameter is adopted to track two consecutive policy gradient surrogates. Moreover, MDPGT provably achieves the best available sample complexity of $O (N^{- 1} ϵ^{- 3})$ for converging to an $ϵ$ -stationary point of the global average of $N$ local performance functions (possibly nonconcave). This outperforms the state-of-the-art sample complexity in decentralized model-free reinforcement learning, and when initialized with a single…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xylee95/md-pgt
pytorchOfficial

Videos

MDPGT: Momentum-Based Decentralized Policy Gradient Tracking· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Machine Learning and ELM

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings