Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games

Sihan Zeng; Thinh T. Doan; Justin Romberg

arXiv:2205.13746·math.OC·October 13, 2022·1 cites

Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games

Sihan Zeng, Thinh T. Doan, Justin Romberg

PDF

Open Access 1 Video

TL;DR

This paper introduces an entropy-regularized approach to solve two-player zero-sum Markov games, demonstrating convergence of gradient descent ascent to Nash equilibrium and providing improved finite-time guarantees.

Contribution

It shows that regularization enables convergence of gradient descent ascent to Nash equilibrium in non-convex Markov games, with explicit performance bounds.

Findings

01

Convergence of regularized GDA to Nash equilibrium.

02

Improved finite-time performance guarantees.

03

Numerical simulations confirm accelerated convergence.

Abstract

We study the problem of finding the Nash equilibrium in a two-player zero-sum Markov game. Due to its formulation as a minimax optimization program, a natural approach to solve the problem is to perform gradient descent/ascent with respect to each player in an alternating fashion. However, due to the non-convexity/non-concavity of the underlying objective function, theoretical understandings of this method are limited. In our paper, we consider solving an entropy-regularized variant of the Markov game. The regularization introduces structure into the optimization landscape that make the solutions more identifiable and allow the problem to be solved more efficiently. Our main contribution is to show that under proper choices of the regularization parameter, the gradient descent ascent algorithm converges to the Nash equilibrium of the original unregularized problem. We explicitly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Markov Chains and Monte Carlo Methods