GPU-Accelerated Counterfactual Regret Minimization
Juho Kim

TL;DR
This paper introduces a GPU-accelerated implementation of counterfactual regret minimization, significantly increasing computational speed for large-scale imperfect information games by leveraging parallel matrix operations.
Contribution
The paper presents a novel GPU-based implementation of counterfactual regret minimization using dense and sparse matrix operations, enabling faster solutions for large games.
Findings
Up to 401.2x faster than OpenSpiel's Python implementation
Up to 203.6x faster than OpenSpiel's C++ implementation
Speedup increases with game size
Abstract
Counterfactual regret minimization is a family of algorithms of no-regret learning dynamics capable of solving large-scale imperfect information games. We propose implementing this algorithm as a series of dense and sparse matrix and vector operations, thereby making it highly parallelizable for a graphical processing unit, at a cost of higher memory usage. Our experiments show that our implementation performs up to about 401.2 times faster than OpenSpiel's Python implementation and, on an expanded set of games, up to about 203.6 times faster than OpenSpiel's C++ implementation and the speedup becomes more pronounced as the size of the game being solved grows.
Peer Reviews
Decision·Submitted to ICLR 2025
Originality: The paper introduces a creative approach by reformulating Counterfactual Regret Minimization (CFR) as matrix operations suitable for GPU processing. This novel restructuring allows a highly parallelizable version of CFR, which has not been extensively explored in existing CFR literature. Efficiency in Design: By avoiding recursive tree traversal, the implementation achieves substantial speed gains, especially in larger games, demonstrating an efficient design choice that effectivel
Originality Limitations: Although innovative, the paper applies GPU parallelization to the vanilla CFR algorithm, which is somewhat limited in novelty given the existence of other CFR variants that incorporate modern enhancements (e.g., CFR+ or discounting techniques). A broader implementation encompassing these would increase the relevance of this work. Limited Exploration of Advanced CFR Variants: The paper does not explore compatibility with modern CFR variants, such as sampling-based or dis
This paper is interesting because it tries to solve two problems at the same time: - APIs like [GraphBLAS](https://graphblas.org/) have successfully represented graph algorithms as a sequence of BLAS-like operations over semirings. This paper tries to do the same for CFR. - It's not obvious how GPUs, the powerhouse of deep learning, can be used to accelerate game solving (other than calling neural networks). This paper tries to solve this gap.
Overall, this paper tries to aim for a best-of-both-worlds approach: low coding effort and high performance. Instead, it ends up with an exposition that is somehow less clear than the original CFR paper, benchmarks that don't inspire confidence, and the resulting algorithm seems to be not very flexible and requires major efforts to do the simplest changes like going from simultaneous to alternating variants of CFR. - The open spiel codebase is not an example of a performant CFR implementation
The acceleration seems quite significant as the author claimed.
Well, it is unclear to me if this paper fits well for ICLR since there is no new algorithm / methodology / theory proposed. It may fit more to ML system venue. The benchmark selected (Game in OpenSpiel) is less known. I will suggest show improvements on more common benchmarks. I have to admit I do not have sufficient GPU hardware background to evaluate this paper.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Anomaly Detection Techniques and Applications · Neural Networks and Applications
MethodsSparse Evolutionary Training
