Reinforcement Learning in Nonzero-sum Linear Quadratic Deep Structured Games: Global Convergence of Policy Optimization
Masoud Roudneshin, Jalal Arabneydi, Amir G. Aghdam

TL;DR
This paper proves that policy gradient methods globally converge to Nash equilibria in nonzero-sum linear quadratic deep structured games, a significant advancement in understanding multi-agent reinforcement learning in complex dynamic systems.
Contribution
It introduces the first global convergence proof for policy optimization algorithms in nonzero-sum LQ games, applicable to both model-based and model-free settings.
Findings
Policy gradient methods converge globally to Nash equilibrium.
Algorithms have parameter spaces independent of the number of players.
Computational efficiency is improved when state dimension exceeds action dimension.
Abstract
We study model-based and model-free policy optimization in a class of nonzero-sum stochastic dynamic games called linear quadratic (LQ) deep structured games. In such games, players interact with each other through a set of weighted averages (linear regressions) of the states and actions. In this paper, we focus our attention to homogeneous weights; however, for the special case of infinite population, the obtained results extend to asymptotically vanishing weights wherein the players learn the sequential weighted mean-field equilibrium. Despite the non-convexity of the optimization in policy space and the fact that policy optimization does not generally converge in game setting, we prove that the proposed model-based and model-free policy gradient descent and natural policy gradient descent algorithms globally converge to the sub-game perfect Nash equilibrium. To the best of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
