Last-Iterate Convergence of Payoff-Based Independent Learning in Zero-Sum Stochastic Games
Zaiwei Chen, Kaiqing Zhang, Eric Mazumdar, Asuman Ozdaglar, and Adam, Wierman

TL;DR
This paper introduces payoff-based learning dynamics for zero-sum stochastic games that are proven to converge with finite-sample guarantees, providing new insights into the sample complexity of finding Nash equilibria.
Contribution
It presents the first finite-sample analysis of last-iterate convergence for payoff-based learning in zero-sum stochastic games, with novel Lyapunov-based techniques.
Findings
Sample complexity of O(ε^{-1}) for Nash distribution in matrix games
Sample complexity of O(ε^{-8}) for Nash equilibrium in matrix and stochastic games
First finite-sample last-iterate convergence guarantees for these learning dynamics
Abstract
In this paper, we consider two-player zero-sum matrix and stochastic games and develop learning dynamics that are payoff-based, convergent, rational, and symmetric between the two players. Specifically, the learning dynamics for matrix games are based on the smoothed best-response dynamics, while the learning dynamics for stochastic games build upon those for matrix games, with additional incorporation of the minimax value iteration. To our knowledge, our theoretical results present the first finite-sample analysis of such learning dynamics with last-iterate guarantees. In the matrix game setting, the results imply a sample complexity of to find the Nash distribution and a sample complexity of to find a Nash equilibrium. In the stochastic game setting, the results also imply a sample complexity of to find a Nash equilibrium. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Stochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research
