Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines
Martin Schmid, Neil Burch, Marc Lanctot, Matej Moravcik, Rudolf, Kadlec, Michael Bowling

TL;DR
This paper introduces VR-MCCFR, a variance reduction technique for Monte Carlo Counterfactual Regret Minimization in extensive form games, significantly speeding up convergence and reducing variance in strategy learning.
Contribution
The paper proposes a novel variance reduction method for MCCFR that uses baselines and bootstrapping, enabling faster convergence and enabling CFR+ with sampling.
Findings
Variance of estimates reduced by three orders of magnitude.
Empirical variance decreases significantly, improving convergence speed.
Enables CFR+ to be used with sampling, increasing speedup.
Abstract
Learning strategies for imperfect information games from samples of interaction is a challenging problem. A common method for this setting, Monte Carlo Counterfactual Regret Minimization (MCCFR), can have slow long-term convergence rates due to high variance. In this paper, we introduce a variance reduction technique (VR-MCCFR) that applies to any sampling variant of MCCFR. Using this technique, per-iteration estimated values and updates are reformulated as a function of sampled values and state-action baselines, similar to their use in policy gradient reinforcement learning. The new formulation allows estimates to be bootstrapped from other estimates within the same episode, propagating the benefits of baselines along the sampled trajectory; the estimates remain unbiased even when bootstrapping from other estimates. Finally, we show that given a perfect baseline, the variance of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
