RLCFR: Minimize Counterfactual Regret by Deep Reinforcement Learning

Huale Li; Xuan Wang; Fengwei Jia; Yifan Li; Yulin Wu; Jiajia Zhang,; Shuhan Qi

arXiv:2009.06373·cs.LG·September 15, 2020

RLCFR: Minimize Counterfactual Regret by Deep Reinforcement Learning

Huale Li, Xuan Wang, Fengwei Jia, Yifan Li, Yulin Wu, Jiajia Zhang,, Shuhan Qi

PDF

Open Access

TL;DR

This paper introduces RLCFR, a reinforcement learning framework that enhances the generalization of counterfactual regret minimization in two-player zero-sum imperfect information games by learning adaptive regret updating policies.

Contribution

RLCFR models the CFR iterative process as an MDP and learns a policy for regret updating, improving generalization over existing methods.

Findings

01

Significantly improved generalization ability on various games

02

Outperforms state-of-the-art CFR methods in experiments

03

Effective learning of regret update policies through reinforcement learning

Abstract

Counterfactual regret minimization (CFR) is a popular method to deal with decision-making problems of two-player zero-sum games with imperfect information. Unlike existing studies that mostly explore for solving larger scale problems or accelerating solution efficiency, we propose a framework, RLCFR, which aims at improving the generalization ability of the CFR method. In the RLCFR, the game strategy is solved by the CFR in a reinforcement learning framework. And the dynamic procedure of iterative interactive strategy updating is modeled as a Markov decision process (MDP). Our method, RLCFR, then learns a policy to select the appropriate way of regret updating in the process of iteration. In addition, a stepwise reward function is formulated to learn the action policy, which is proportional to how well the iteration strategy is at each step. Extensive experimental results on various…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research