Two-Player Zero-Sum Games with Bandit Feedback
Elif Y{\i}lmaz, Christos Dimitrakakis

TL;DR
This paper investigates algorithms for two-player zero-sum games with unknown payoffs, using bandit feedback, and derives instance-dependent regret bounds demonstrating effective learning of Nash equilibria.
Contribution
It introduces three algorithms adapted for zero-sum games with bandit feedback and provides novel instance-dependent regret bounds for these methods.
Findings
Achieves $O(rac{ ext{log}(T riangle^2)}{ riangle})$ regret bounds for adaptive elimination.
Attains $O( riangle + rac{1}{ ext{sqrt}(T)})$ regret bounds for ETC.
Demonstrates the effectiveness of ETC and elimination algorithms in learning pure strategy Nash equilibria.
Abstract
We study a two-player zero-sum game in which the row player aims to maximize their payoff against a competing column player, under an unknown payoff matrix estimated through bandit feedback. We propose three algorithms based on the Explore-Then-Commit (ETC) and action pair elimination frameworks. The first adapts it to zero-sum games, the second incorporates adaptive elimination that leverages the -Nash Equilibrium property to efficiently select the optimal action pair, and the third extends the elimination algorithm by employing non-uniform exploration. Our objective is to demonstrate the applicability of ETC and action pair elimination algorithms in a zero-sum game setting by focusing on learning pure strategy Nash Equilibria. A key contribution of our work is a derivation of instance-dependent upper bounds on the expected regret of our proposed algorithms, which has…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Auction Theory and Applications
MethodsDense Connections · Contrastive Predictive Coding · Layer Normalization · Relative Position Encodings · Extended Transformer Construction
