Learning Approximate Nash Equilibria in Cooperative Multi-Agent Reinforcement Learning via Mean-Field Subsampling
Emile Anand, Ishani Karmarkar

TL;DR
This paper introduces a novel alternating learning framework for cooperative multi-agent reinforcement learning with limited observability, achieving convergence to approximate Nash equilibria.
Contribution
It proposes ALTERNATING-MARL, a method that combines subsampled mean-field Q-learning with local policy updates, and provides theoretical convergence guarantees.
Findings
Converges to an $ ilde{O}(1/\sqrt{k})$-approximate Nash equilibrium.
Separates sample complexities between joint state and action spaces.
Validated through multi-robot control simulations.
Abstract
Many large-scale platforms and networked control systems have a centralized decision maker interacting with a massive population of agents under strict observability constraints. Motivated by such applications, we study a cooperative Markov game with a global agent and homogeneous local agents in a communication-constrained regime, where the global agent only observes a subset of local agent states per time step. We propose an alternating learning framework , where the global agent performs subsampled mean-field -learning against a fixed local policy, and local agents update by optimizing in an induced MDP. We prove that these approximate best-response dynamics converge to an -approximate Nash Equilibrium, while separating the sample complexities between the joint state and action spaces. Finally, we validate our results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
