A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games
Samuel Sokota, Ryan D'Orazio, J. Zico Kolter, Nicolas Loizou, Marc, Lanctot, Ioannis Mitliagkas, Noam Brown, Christian Kroer

TL;DR
This paper introduces magnetic mirror descent, a novel algorithm that effectively solves equilibrium problems and enhances reinforcement learning in two-player zero-sum games, demonstrating superior convergence and empirical performance.
Contribution
The paper presents magnetic mirror descent, the first to achieve linear convergence for extensive-form games and competitive results with CFR in tabular RL, along with successful deep RL applications.
Findings
Linear convergence for extensive-form games with first order feedback
Empirically competitive results with CFR in tabular settings
Effective self-play deep RL in Dark Hex and Phantom Tic-Tac-Toe
Abstract
This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gradient algorithm. Our contribution is demonstrating the virtues of magnetic mirror descent as both an equilibrium solver and as an approach to reinforcement learning in two-player zero-sum games. These virtues include: 1) Being the first quantal response equilibria solver to achieve linear convergence for extensive-form games with first order feedback; 2) Being the first standard reinforcement learning algorithm to achieve empirically competitive results with CFR in tabular settings; 3) Achieving favorable performance in 3x3 Dark Hex and Phantom Tic-Tac-Toe as a self-play deep reinforcement learning algorithm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsExperimental Behavioral Economics Studies · Game Theory and Applications · Reinforcement Learning in Robotics
MethodsEntropy Regularization · Proximal Policy Optimization
