Near-Optimal Last-iterate Convergence of Policy Optimization in Zero-sum Polymatrix Markov games
Zailin Ma, Jiansheng Yang, Zhihua Zhang

TL;DR
This paper introduces ER-OMWU, a policy optimization algorithm with last-iterate convergence guarantees for zero-sum polymatrix Markov games, achieving near-optimal iteration complexity for approximate Nash equilibria.
Contribution
First policy optimization algorithm with convergence guarantees for zero-sum polymatrix Markov games, extending two-player results to multi-player settings with near-optimal complexity.
Findings
ER-OMWU converges to an $ ilde{O}(1/\epsilon)$ iteration bound.
The algorithm is symmetric and nearly uncoupled, suitable for decentralized implementation.
Provides the first last-iterate convergence guarantee in this setting.
Abstract
Computing approximate Nash equilibria in multi-player general-sum Markov games is a computationally intractable task. However, multi-player Markov games with certain cooperative or competitive structures might circumvent this intractability. In this paper, we focus on multi-player zero-sum polymatrix Markov games, where players interact in a pairwise fashion while remain overall competitive. To the best of our knowledge, we propose the first policy optimization algorithm called Entropy-Regularized Optimistic-Multiplicative-Weights-Update (ER-OMWU) for finding approximate Nash equilibria in finite-horizon zero-sum polymatrix Markov games with full information feedback. We provide last-iterate convergence guarantees for finding an -approximate Nash equilibrium within iterations, which is near-optimal compared to the optimal iteration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Markov Chains and Monte Carlo Methods · Adversarial Robustness in Machine Learning
