NashPG: A Policy Gradient Method with Iteratively Refined Regularization for Finding Nash Equilibria

Eason Yu; Tzu Hao Liu; Cl\'ement L. Canonne; Yunke Wang; Chang Xu; Nguyen H. Tran; Stefano V. Albrecht

arXiv:2510.18183·cs.LG·May 1, 2026

NashPG: A Policy Gradient Method with Iteratively Refined Regularization for Finding Nash Equilibria

Eason Yu, Tzu Hao Liu, Cl\'ement L. Canonne, Yunke Wang, Chang Xu, Nguyen H. Tran, Stefano V. Albrecht

PDF

TL;DR

NashPG introduces a policy gradient method with iterative regularization that guarantees convergence to Nash equilibria in two-player zero-sum games, demonstrating scalability and improved performance over prior methods.

Contribution

The paper proposes a novel regularization framework integrated into policy gradient methods, ensuring convergence to Nash equilibria without full game tree enumeration.

Findings

01

NashPG achieves comparable or lower exploitability than prior methods.

02

It scales effectively to large domains like Battleship and Poker.

03

NashPG attains higher average payoff in head-to-head matches.

Abstract

Finding Nash equilibria in two-player zero-sum imperfect-information games remains a central challenge in multi-agent reinforcement learning. Recent multi-round regularization methods offer a promising direction, yet existing approaches either require full enumeration of the game tree or rely on non-policy-gradient inner solvers that underperform in practice, leaving a scalable policy-gradient-based solution open. In this paper, we propose a novel multi-round regularization procedure and show that it guarantees strictly monotonic reduction in Bregman divergence to Nash equilibria and eventual convergence to one in two-player zero-sum extensive-form games. Guided by this framework, we develop a practical algorithm, Nash Policy Gradient (NashPG), which places the regularization directly in the policy optimization objective and is implemented using standard policy gradient methods.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.