Last-Iterate Convergence of Payoff-Based Independent Learning in   Zero-Sum Stochastic Games

Zaiwei Chen; Kaiqing Zhang; Eric Mazumdar; Asuman Ozdaglar; and Adam; Wierman

arXiv:2409.01447·cs.LG·September 6, 2024

Last-Iterate Convergence of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

Zaiwei Chen, Kaiqing Zhang, Eric Mazumdar, Asuman Ozdaglar, and Adam, Wierman

PDF

Open Access

TL;DR

This paper introduces payoff-based learning dynamics for zero-sum stochastic games that are proven to converge with finite-sample guarantees, providing new insights into the sample complexity of finding Nash equilibria.

Contribution

It presents the first finite-sample analysis of last-iterate convergence for payoff-based learning in zero-sum stochastic games, with novel Lyapunov-based techniques.

Findings

01

Sample complexity of O(ε^{-1}) for Nash distribution in matrix games

02

Sample complexity of O(ε^{-8}) for Nash equilibrium in matrix and stochastic games

03

First finite-sample last-iterate convergence guarantees for these learning dynamics

Abstract

In this paper, we consider two-player zero-sum matrix and stochastic games and develop learning dynamics that are payoff-based, convergent, rational, and symmetric between the two players. Specifically, the learning dynamics for matrix games are based on the smoothed best-response dynamics, while the learning dynamics for stochastic games build upon those for matrix games, with additional incorporation of the minimax value iteration. To our knowledge, our theoretical results present the first finite-sample analysis of such learning dynamics with last-iterate guarantees. In the matrix game setting, the results imply a sample complexity of $O (ϵ^{- 1})$ to find the Nash distribution and a sample complexity of $O (ϵ^{- 8})$ to find a Nash equilibrium. In the stochastic game setting, the results also imply a sample complexity of $O (ϵ^{- 8})$ to find a Nash equilibrium. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Stochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research