NashPG: A Policy Gradient Method with Iteratively Refined Regularization for Finding Nash Equilibria
Eason Yu, Tzu Hao Liu, Cl\'ement L. Canonne, Yunke Wang, Chang Xu, Nguyen H. Tran, Stefano V. Albrecht

TL;DR
NashPG introduces a policy gradient method with iterative regularization that guarantees convergence to Nash equilibria in two-player zero-sum games, demonstrating scalability and improved performance over prior methods.
Contribution
The paper proposes a novel regularization framework integrated into policy gradient methods, ensuring convergence to Nash equilibria without full game tree enumeration.
Findings
NashPG achieves comparable or lower exploitability than prior methods.
It scales effectively to large domains like Battleship and Poker.
NashPG attains higher average payoff in head-to-head matches.
Abstract
Finding Nash equilibria in two-player zero-sum imperfect-information games remains a central challenge in multi-agent reinforcement learning. Recent multi-round regularization methods offer a promising direction, yet existing approaches either require full enumeration of the game tree or rely on non-policy-gradient inner solvers that underperform in practice, leaving a scalable policy-gradient-based solution open. In this paper, we propose a novel multi-round regularization procedure and show that it guarantees strictly monotonic reduction in Bregman divergence to Nash equilibria and eventual convergence to one in two-player zero-sum extensive-form games. Guided by this framework, we develop a practical algorithm, Nash Policy Gradient (NashPG), which places the regularization directly in the policy optimization objective and is implemented using standard policy gradient methods.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
