Instance-Dependent Regret Bounds for Learning Two-Player Zero-Sum Games   with Bandit Feedback

Shinji Ito; Haipeng Luo; Taira Tsuchiya; Yue Wu

arXiv:2502.17625·cs.LG·February 26, 2025

Instance-Dependent Regret Bounds for Learning Two-Player Zero-Sum Games with Bandit Feedback

Shinji Ito, Haipeng Luo, Taira Tsuchiya, Yue Wu

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that in two-player zero-sum games with bandit feedback, players using the Tsallis-INF algorithm can achieve accelerated regret bounds and convergence, especially when a pure strategy Nash equilibrium exists.

Contribution

It provides the first analysis of regret bounds under bandit feedback for two-player zero-sum games, showing improved, instance-dependent bounds and convergence guarantees.

Findings

01

Regret bound of O(c_1 log T + sqrt(c_2 T)) with bandit feedback

02

Optimal regret bound when a pure strategy Nash equilibrium exists

03

Algorithm achieves last-iterate convergence and near-optimal sample complexity

Abstract

No-regret self-play learning dynamics have become one of the premier ways to solve large-scale games in practice. Accelerating their convergence via improving the regret of the players over the naive $O (T)$ bound after $T$ rounds has been extensively studied in recent years, but almost all studies assume access to exact gradient feedback. We address the question of whether acceleration is possible under bandit feedback only and provide an affirmative answer for two-player zero-sum normal-form games. Specifically, we show that if both players apply the Tsallis-INF algorithm of Zimmert and Seldin (2018, arXiv:1807.07623), then their regret is at most $O (c_{1} lo g T + c_{2} T)$ , where $c_{1}$ and $c_{2}$ are game-dependent constants that characterize the difficulty of learning -- $c_{1}$ resembles the complexity of learning a stochastic multi-armed bandit instance and depends…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

EtaoinWu/instance-dependent-game-learning
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Data Stream Mining Techniques