Reducing Variance Caused by Communication in Decentralized Multi-agent Deep Reinforcement Learning
Changxi Zhu, Mehdi Dastani, Shihan Wang

TL;DR
This paper analyzes how communication-induced uncertainty affects variance in decentralized multi-agent deep reinforcement learning and proposes modular techniques to reduce this variance, improving training stability and performance.
Contribution
It provides a theoretical analysis of communication-induced variance and introduces modular variance reduction techniques for decentralized MADRL algorithms.
Findings
Variance reduction improves training stability.
Enhanced algorithms achieve higher performance.
Techniques are validated on StarCraft and Traffic Junction tasks.
Abstract
In decentralized multi-agent deep reinforcement learning (MADRL), communication can help agents to gain a better understanding of the environment to better coordinate their behaviors. Nevertheless, communication may involve uncertainty, which potentially introduces variance to the learning of decentralized agents. In this paper, we focus on a specific decentralized MADRL setting with communication and conduct a theoretical analysis to study the variance that is caused by communication in policy gradients. We propose modular techniques to reduce the variance in policy gradients during training. We adopt our modular techniques into two existing algorithms for decentralized MADRL with communication and evaluate them on multiple tasks in the StarCraft Multi-Agent Challenge and Traffic Junction domains. The results show that decentralized MADRL communication methods extended with our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Research in Systems and Signal Processing · Reinforcement Learning in Robotics · Evolutionary Algorithms and Applications
