Taming Multi-Agent Reinforcement Learning with Estimator Variance Reduction
Taher Jafferjee, Juliusz Ziomek, Tianpei Yang, Zipeng Dai, Jianhong, Wang, Matthew Taylor, Kun Shao, Jun Wang, David Mguni

TL;DR
This paper introduces PERLA, a variance reduction technique for multi-agent reinforcement learning that improves the accuracy of value estimates, enhances scalability, and maintains the benefits of centralized training.
Contribution
PERLA is a novel framework that reduces estimator variance in CT-DE MARL methods by sampling joint-policy estimates, enabling more efficient and scalable learning.
Findings
PERLA significantly reduces variance in value estimates.
PERLA improves learning efficiency in multi-agent benchmarks.
PERLA outperforms baseline methods in Multi-agent Mujoco and StarCraft II.
Abstract
Centralised training with decentralised execution (CT-DE) serves as the foundation of many leading multi-agent reinforcement learning (MARL) algorithms. Despite its popularity, it suffers from a critical drawback due to its reliance on learning from a single sample of the joint-action at a given state. As agents explore and update their policies during training, these single samples may poorly represent the actual joint-policy of the system of agents leading to high variance gradient estimates that hinder learning. To address this problem, we propose an enhancement tool that accommodates any actor-critic MARL method. Our framework, Performance Enhancing Reinforcement Learning Apparatus (PERLA), introduces a sampling technique of the agents' joint-policy into the critics while the agents train. This leads to TD updates that closely approximate the true expected value under the current…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Auction Theory and Applications · Adaptive Dynamic Programming Control
