Near-Optimal Last-iterate Convergence of Policy Optimization in Zero-sum   Polymatrix Markov games

Zailin Ma; Jiansheng Yang; Zhihua Zhang

arXiv:2308.07873·cs.GT·August 17, 2023·1 cites

Near-Optimal Last-iterate Convergence of Policy Optimization in Zero-sum Polymatrix Markov games

Zailin Ma, Jiansheng Yang, Zhihua Zhang

PDF

Open Access

TL;DR

This paper introduces ER-OMWU, a policy optimization algorithm with last-iterate convergence guarantees for zero-sum polymatrix Markov games, achieving near-optimal iteration complexity for approximate Nash equilibria.

Contribution

First policy optimization algorithm with convergence guarantees for zero-sum polymatrix Markov games, extending two-player results to multi-player settings with near-optimal complexity.

Findings

01

ER-OMWU converges to an $ ilde{O}(1/\epsilon)$ iteration bound.

02

The algorithm is symmetric and nearly uncoupled, suitable for decentralized implementation.

03

Provides the first last-iterate convergence guarantee in this setting.

Abstract

Computing approximate Nash equilibria in multi-player general-sum Markov games is a computationally intractable task. However, multi-player Markov games with certain cooperative or competitive structures might circumvent this intractability. In this paper, we focus on multi-player zero-sum polymatrix Markov games, where players interact in a pairwise fashion while remain overall competitive. To the best of our knowledge, we propose the first policy optimization algorithm called Entropy-Regularized Optimistic-Multiplicative-Weights-Update (ER-OMWU) for finding approximate Nash equilibria in finite-horizon zero-sum polymatrix Markov games with full information feedback. We provide last-iterate convergence guarantees for finding an $ϵ$ -approximate Nash equilibrium within $\tilde{O} (1/ ϵ)$ iterations, which is near-optimal compared to the optimal $O (1/ ϵ)$ iteration…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Markov Chains and Monte Carlo Methods · Adversarial Robustness in Machine Learning