A Payoff-Based Policy Gradient Method in Stochastic Games with Long-Run   Average Payoffs

Junyue Zhang; Yifen Mu

arXiv:2405.09811·cs.GT·May 17, 2024

A Payoff-Based Policy Gradient Method in Stochastic Games with Long-Run Average Payoffs

Junyue Zhang, Yifen Mu

PDF

Open Access

TL;DR

This paper introduces a novel payoff-based policy gradient algorithm for stochastic games with long-run average payoffs, proving its convergence to Nash equilibria under broad stability conditions.

Contribution

It develops an equivalent gradient formulation, demonstrates Lipschitz continuity, and constructs a bandit learning algorithm with proven convergence for such games.

Findings

01

Gradient dominance property established for value functions

02

Algorithm converges to Nash equilibrium with probability one

03

Applicable to a wide class of stable stochastic games

Abstract

Despite the significant potential for various applications, stochastic games with long-run average payoffs have received limited scholarly attention, particularly concerning the development of learning algorithms for them due to the challenges of mathematical analysis. In this paper, we study the stochastic games with long-run average payoffs and present an equivalent formulation for individual payoff gradients by defining advantage functions which will be proved to be bounded. This discovery allows us to demonstrate that the individual payoff gradient function is Lipschitz continuous with respect to the policy profile and that the value function of the games exhibits the gradient dominance property. Leveraging these insights, we devise a payoff-based gradient estimation approach and integrate it with the Regularized Robbins-Monro method from stochastic approximation theory to construct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEconomic Policies and Impacts · Risk and Portfolio Optimization · Optimization and Search Problems