Centralized Adaptive Sampling for Reliable Co-Training of Independent Multi-Agent Policies
Nicholas E. Corrado, Josiah P. Hanna

TL;DR
This paper introduces CoSER, a centralized adaptive sampling method that reduces joint sampling error in multi-agent reinforcement learning, improving the reliability and convergence of independent policy gradient algorithms.
Contribution
The paper proposes CoSER, an adaptive action sampling approach that reduces joint sampling error, enhancing the reliability of multi-agent policy training.
Findings
CoSER reduces joint sampling error more efficiently than independent sampling.
Reducing sampling error increases the reliability of policy gradient algorithms.
Empirical results show improved convergence in multi-agent games.
Abstract
Independent on-policy policy gradient algorithms are widely used for multi-agent reinforcement learning (MARL) in cooperative and no-conflict games, but they are known to converge sub-optimally when each agent's individual policy gradient points away from an optimal joint equilibrium. Going beyond prior work, we observe that sub-optimal convergence can still arise even when the expected individual policy gradients of each agent point toward the optimal joint solution. After collecting a finite set of trajectories, stochasticity in independent action sampling can cause the joint data distribution to deviate from the expected joint on-policy distribution. This \textit{sampling error} w.r.t. the joint on-policy distribution produces inaccurate gradient estimates that can make agents converge sub-optimally. We hypothesize that joint sampling error can be reduced through coordinated action…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
