Policy Gradient Methods Find the Nash Equilibrium in N-player   General-sum Linear-quadratic Games

Ben Hambly; Renyuan Xu; Huining Yang

arXiv:2107.13090·math.OC·August 16, 2022·1 cites

Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games

Ben Hambly, Renyuan Xu, Huining Yang

PDF

Open Access

TL;DR

This paper proves that the natural policy gradient method converges to the Nash equilibrium in stochastic N-player linear-quadratic games, with noise being essential for convergence, supported by theoretical analysis and numerical experiments.

Contribution

It establishes the global convergence of policy gradient methods in stochastic linear-quadratic games and identifies noise as a key factor for convergence.

Findings

01

Policy gradient converges to Nash equilibrium with sufficient noise.

02

Noise lower bound depends on model parameters.

03

Numerical experiments confirm noise-induced convergence.

Abstract

We consider a general-sum N-player linear-quadratic game with stochastic dynamics over a finite horizon and prove the global convergence of the natural policy gradient method to the Nash equilibrium. In order to prove the convergence of the method, we require a certain amount of noise in the system. We give a condition, essentially a lower bound on the covariance of the noise in terms of the model parameters, in order to guarantee convergence. We illustrate our results with numerical experiments to show that even in situations where the policy gradient method may not converge in the deterministic setting, the addition of noise leads to convergence.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adaptive Dynamic Programming Control