Finite-Sample Guarantees for Learning Dynamics in Zero-Sum Polymatrix Games
Fathima Zarin Faizal, Asuman Ozdaglar, Martin J. Wainwright

TL;DR
This paper establishes finite-sample convergence guarantees for learning dynamics in zero-sum polymatrix games under two information scenarios, using a two-timescale approach combining smoothed best-response and TD-learning.
Contribution
It introduces a novel two-timescale learning dynamic with finite-sample guarantees for zero-sum polymatrix games, especially in the minimal information setting.
Findings
Polynomial-time convergence to ε-Nash equilibrium
Finite-sample guarantees established for both information settings
Effective learning dynamics without additional exploration
Abstract
We study best-response type learning dynamics for zero-sum polymatrix games under two information settings. The two settings are distinguished by the type of information that each player has about the game and their opponents' strategy. The first setting is the full information case, in which each player knows their own and their opponents' payoff matrices and observes everyone's mixed strategies. The second setting is the minimal information case, where players do not observe their opponents' strategies and are not aware of any payoff matrices (instead they only observe their realized payoffs). For this setting, also known as the radically uncoupled case in the learning in games literature, we study a two-timescale learning dynamics that combine smoothed best-response type updates for strategy estimates with a TD-learning update to estimate a local payoff function. For these dynamics,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research
MethodsAttentive Walk-Aggregating Graph Neural Network
