Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent
Yu Bai, Chi Jin, Song Mei, Ziang Song, Tiancheng Yu

TL;DR
This paper introduces a polynomial-time online mirror descent approach for learning equilibria in extensive-form games, achieving optimal regret bounds and connecting game-theoretic algorithms with mirror descent techniques.
Contribution
It establishes a novel equivalence between Phi-Hedge algorithms and online mirror descent for EFGs, enabling efficient equilibrium learning with improved regret guarantees.
Findings
Polynomial-time algorithms for EFCE with optimal regret.
Equivalence between Phi-Hedge and OMD in EFGs.
Achieved matching lower bounds for bandit-feedback regret.
Abstract
A conceptually appealing approach for learning Extensive-Form Games (EFGs) is to convert them to Normal-Form Games (NFGs). This approach enables us to directly translate state-of-the-art techniques and analyses in NFGs to learning EFGs, but typically suffers from computational intractability due to the exponential blow-up of the game size introduced by the conversion. In this paper, we address this problem in natural and important setups for the \emph{-Hedge} algorithm -- A generic algorithm capable of learning a large class of equilibria for NFGs. We show that -Hedge can be directly used to learn Nash Equilibria (zero-sum settings), Normal-Form Coarse Correlated Equilibria (NFCCE), and Extensive-Form Correlated Equilibria (EFCE) in EFGs. We prove that, in those settings, the \emph{-Hedge} algorithms are equivalent to standard Online Mirror Descent (OMD) algorithms for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Auction Theory and Applications
