Stochastic Regret Minimization in Extensive-Form Games
Gabriele Farina, Christian Kroer, and Tuomas Sandholm

TL;DR
This paper introduces a flexible framework for stochastic regret minimization in extensive-form games, enhancing theoretical convergence and enabling new algorithms that outperform MCCFR in experiments.
Contribution
It develops a general framework that integrates any regret-minimization algorithm with gradient estimators, extending beyond MCCFR and improving theoretical and empirical performance.
Findings
New stochastic methods outperform MCCFR in experiments
Framework provides stronger convergence guarantees
Analysis simplifies understanding of MCCFR's properties
Abstract
Monte-Carlo counterfactual regret minimization (MCCFR) is the state-of-the-art algorithm for solving sequential games that are too large for full tree traversals. It works by using gradient estimates that can be computed via sampling. However, stochastic methods for sequential games have not been investigated extensively beyond MCCFR. In this paper we develop a new framework for developing stochastic regret minimization methods. This framework allows us to use any regret-minimization algorithm, coupled with any gradient estimator. The MCCFR algorithm can be analyzed as a special case of our framework, and this analysis leads to significantly-stronger theoretical on convergence, while simultaneously yielding a simplified proof. Our framework allows us to instantiate several new stochastic methods for solving sequential games. We show extensive experiments on three games, where some…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Artificial Intelligence in Games · Reinforcement Learning in Robotics
