Generalized Bandit Regret Minimizer Framework in Imperfect Information Extensive-Form Game
Linjian Meng, Yang Gao

TL;DR
This paper introduces a generalized framework for regret minimization in imperfect information games with bandit feedback, enabling more efficient learning of approximate Nash equilibria.
Contribution
It proposes a modular theoretical framework for bandit regret minimization, analyzes existing methods as special cases, and introduces a novel, more efficient algorithm SIX-OMD.
Findings
SIX-OMD significantly improves convergence rates over previous methods.
The framework unifies analysis of various bandit regret minimization algorithms.
SIX-OMD is computationally efficient, requiring only current and average strategy updates.
Abstract
Regret minimization methods are a powerful tool for learning approximate Nash equilibrium (NE) in two-player zero-sum imperfect information extensive-form games (IIEGs). We consider the problem in the interactive bandit-feedback setting where we don't know the dynamics of the IIEG. In general, only the interactive trajectory and the reached terminal node value are revealed. To learn NE, the regret minimizer is required to estimate the full-feedback loss gradient by and minimize the regret. In this paper, we propose a generalized framework for this learning setting. It presents a theoretical framework for the design and the modular analysis of the bandit regret minimization methods. We demonstrate that the most recent bandit regret minimization methods can be analyzed as a particular case of our framework. Following this framework, we describe a novel method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications
