A Unified Perspective on Deep Equilibrium Finding
Xinrun Wang, Jakub Cerny, Shuxin Li, Chang Yang, Zhuyun, Yin, Hau Chan, Bo An

TL;DR
This paper introduces a unified framework for deep equilibrium-finding algorithms in extensive-form games, combining and generalizing PSRO and CFR through novel neural modules and optimization techniques, leading to improved performance in poker.
Contribution
It proposes a unified perspective on deep equilibrium-finding algorithms, introducing new neural modules and optimization methods that outperform existing frameworks.
Findings
Outperforms both PSRO and CFR in Leduc poker
Introduces a novel response oracle with Q, reaching, and baseline values
Develops a method inspired by fictitious play for component optimization
Abstract
Extensive-form games provide a versatile framework for modeling interactions of multiple agents subjected to imperfect observations and stochastic events. In recent years, two paradigms, policy space response oracles (PSRO) and counterfactual regret minimization (CFR), showed that extensive-form games may indeed be solved efficiently. Both of them are capable of leveraging deep neural networks to tackle the scalability issues inherent to extensive-form games and we refer to them as deep equilibrium-finding algorithms. Even though PSRO and CFR share some similarities, they are often regarded as distinct and the answer to the question of which is superior to the other remains ambiguous. Instead of answering this question directly, in this work we propose a unified perspective on deep equilibrium finding that generalizes both PSRO and CFR. Our four main contributions include: i) a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Sports Analytics and Performance · Advanced Bandit Algorithms Research
