Centralized Adaptive Sampling for Reliable Co-Training of Independent Multi-Agent Policies

Nicholas E. Corrado; Josiah P. Hanna

arXiv:2508.01049·cs.LG·May 14, 2026

Centralized Adaptive Sampling for Reliable Co-Training of Independent Multi-Agent Policies

Nicholas E. Corrado, Josiah P. Hanna

PDF

TL;DR

This paper introduces CoSER, a centralized adaptive sampling method that reduces joint sampling error in multi-agent reinforcement learning, improving the reliability and convergence of independent policy gradient algorithms.

Contribution

The paper proposes CoSER, an adaptive action sampling approach that reduces joint sampling error, enhancing the reliability of multi-agent policy training.

Findings

01

CoSER reduces joint sampling error more efficiently than independent sampling.

02

Reducing sampling error increases the reliability of policy gradient algorithms.

03

Empirical results show improved convergence in multi-agent games.

Abstract

Independent on-policy policy gradient algorithms are widely used for multi-agent reinforcement learning (MARL) in cooperative and no-conflict games, but they are known to converge sub-optimally when each agent's individual policy gradient points away from an optimal joint equilibrium. Going beyond prior work, we observe that sub-optimal convergence can still arise even when the expected individual policy gradients of each agent point toward the optimal joint solution. After collecting a finite set of trajectories, stochasticity in independent action sampling can cause the joint data distribution to deviate from the expected joint on-policy distribution. This \textit{sampling error} w.r.t. the joint on-policy distribution produces inaccurate gradient estimates that can make agents converge sub-optimally. We hypothesize that joint sampling error can be reduced through coordinated action…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.