Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games
Ben Hambly, Renyuan Xu, Huining Yang

TL;DR
This paper proves that the natural policy gradient method converges to the Nash equilibrium in stochastic N-player linear-quadratic games, with noise being essential for convergence, supported by theoretical analysis and numerical experiments.
Contribution
It establishes the global convergence of policy gradient methods in stochastic linear-quadratic games and identifies noise as a key factor for convergence.
Findings
Policy gradient converges to Nash equilibrium with sufficient noise.
Noise lower bound depends on model parameters.
Numerical experiments confirm noise-induced convergence.
Abstract
We consider a general-sum N-player linear-quadratic game with stochastic dynamics over a finite horizon and prove the global convergence of the natural policy gradient method to the Nash equilibrium. In order to prove the convergence of the method, we require a certain amount of noise in the system. We give a condition, essentially a lower bound on the covariance of the noise in terms of the model parameters, in order to guarantee convergence. We illustrate our results with numerical experiments to show that even in situations where the policy gradient method may not converge in the deterministic setting, the addition of noise leads to convergence.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adaptive Dynamic Programming Control
