MDPGT: Momentum-based Decentralized Policy Gradient Tracking
Zhanhong Jiang, Xian Yeow Lee, Sin Yong Tan, Kai Liang Tan, Aditya, Balu, Young M. Lee, Chinmay Hegde, Soumik Sarkar

TL;DR
This paper introduces MDPGT, a novel momentum-based decentralized policy gradient method that improves sample efficiency and convergence speed in multi-agent reinforcement learning, validated through theoretical analysis and empirical experiments.
Contribution
The paper presents a new variance reduction technique and a tracking mechanism for decentralized policy gradients, achieving optimal sample complexity and linear speedup in multi-agent RL.
Findings
Achieves the best known sample complexity of O(N^{-1}ε^{-3}) for convergence.
Validates theoretical claims with experiments on multi-agent RL benchmarks.
Demonstrates linear speedup when the error tolerance is small.
Abstract
We propose a novel policy gradient method for multi-agent reinforcement learning, which leverages two different variance-reduction techniques and does not require large batches over iterations. Specifically, we propose a momentum-based decentralized policy gradient tracking (MDPGT) where a new momentum-based variance reduction technique is used to approximate the local policy gradient surrogate with importance sampling, and an intermediate parameter is adopted to track two consecutive policy gradient surrogates. Moreover, MDPGT provably achieves the best available sample complexity of for converging to an -stationary point of the global average of local performance functions (possibly nonconcave). This outperforms the state-of-the-art sample complexity in decentralized model-free reinforcement learning, and when initialized with a single…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Machine Learning and ELM
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
