Convergence Rates of Average-Reward Multi-agent Reinforcement Learning via Randomized Linear Programming
Alec Koppel, Amrit Singh Bedi, Bhargav Ganguly, Vaneet Aggarwal

TL;DR
This paper analyzes the convergence rates of multi-agent reinforcement learning under average reward criteria, proposing a linear programming approach with optimal sample complexity guarantees and validating through experiments.
Contribution
It introduces a multi-agent linear programming framework with stochastic primal-dual methods, achieving near-optimal sample complexity for average-reward MARL.
Findings
Sample complexity scales optimally with state and action space sizes.
Multi-agent LP approach converges to near-globally optimal solutions.
Experimental results support theoretical convergence guarantees.
Abstract
In tabular multi-agent reinforcement learning with average-cost criterion, a team of agents sequentially interacts with the environment and observes local incentives. We focus on the case that the global reward is a sum of local rewards, the joint policy factorizes into agents' marginals, and full state observability. To date, few global optimality guarantees exist even for this simple setting, as most results yield convergence to stationarity for parameterized policies in large/possibly continuous spaces. To solidify the foundations of MARL, we build upon linear programming (LP) reformulations, for which stochastic primal-dual methods yields a model-free approach to achieve \emph{optimal sample complexity} in the centralized case. We develop multi-agent extensions, whereby agents solve their local saddle point problems and then perform local weighted averaging. We establish that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
