Provably Fast Convergence of Independent Natural Policy Gradient for Markov Potential Games
Youbang Sun, Tao Liu, Ruida Zhou, P. R. Kumar, Shahin Shahrampour

TL;DR
This paper proves that an independent natural policy gradient algorithm converges quickly to an approximate Nash equilibrium in multi-agent Markov potential games, matching single-agent convergence rates.
Contribution
It establishes the first provably fast convergence rate of iterations for independent NPG in Markov potential games, improving over previous bounds.
Findings
Convergence within iterations to -NE
Theoretical bounds verified by empirical experiments
Matches single-agent convergence rate of iterations
Abstract
This work studies an independent natural policy gradient (NPG) algorithm for the multi-agent reinforcement learning problem in Markov potential games. It is shown that, under mild technical assumptions and the introduction of the \textit{suboptimality gap}, the independent NPG method with an oracle providing exact policy evaluation asymptotically reaches an -Nash Equilibrium (NE) within iterations. This improves upon the previous best result of iterations and is of the same order, , that is achievable for the single-agent case. Empirical results for a synthetic potential game and a congestion game are presented to verify the theoretical bounds.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Bandit Algorithms Research
