Two-Player Zero-Sum Games with Bandit Feedback

Elif Y{\i}lmaz; Christos Dimitrakakis

arXiv:2506.14518·cs.LG·February 20, 2026

Two-Player Zero-Sum Games with Bandit Feedback

Elif Y{\i}lmaz, Christos Dimitrakakis

PDF

Open Access

TL;DR

This paper investigates algorithms for two-player zero-sum games with unknown payoffs, using bandit feedback, and derives instance-dependent regret bounds demonstrating effective learning of Nash equilibria.

Contribution

It introduces three algorithms adapted for zero-sum games with bandit feedback and provides novel instance-dependent regret bounds for these methods.

Findings

01

Achieves $O(rac{ ext{log}(T riangle^2)}{ riangle})$ regret bounds for adaptive elimination.

02

Attains $O( riangle + rac{1}{ ext{sqrt}(T)})$ regret bounds for ETC.

03

Demonstrates the effectiveness of ETC and elimination algorithms in learning pure strategy Nash equilibria.

Abstract

We study a two-player zero-sum game in which the row player aims to maximize their payoff against a competing column player, under an unknown payoff matrix estimated through bandit feedback. We propose three algorithms based on the Explore-Then-Commit (ETC) and action pair elimination frameworks. The first adapts it to zero-sum games, the second incorporates adaptive elimination that leverages the $ε$ -Nash Equilibrium property to efficiently select the optimal action pair, and the third extends the elimination algorithm by employing non-uniform exploration. Our objective is to demonstrate the applicability of ETC and action pair elimination algorithms in a zero-sum game setting by focusing on learning pure strategy Nash Equilibria. A key contribution of our work is a derivation of instance-dependent upper bounds on the expected regret of our proposed algorithms, which has…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Auction Theory and Applications

MethodsDense Connections · Contrastive Predictive Coding · Layer Normalization · Relative Position Encodings · Extended Transformer Construction