Policy Optimization for Continuous-time Linear-Quadratic Graphon Mean Field Games

Philipp Plank; Yufei Zhang

arXiv:2506.05894·math.OC·June 9, 2025

Policy Optimization for Continuous-time Linear-Quadratic Graphon Mean Field Games

Philipp Plank, Yufei Zhang

PDF

Open Access

TL;DR

This paper introduces a scalable policy optimization method for continuous-time linear-quadratic graphon mean field games, providing convergence guarantees and demonstrating robustness through numerical experiments.

Contribution

It develops a bilevel policy gradient algorithm tailored for GMFGs with theoretical convergence guarantees and practical effectiveness.

Findings

01

Linear convergence of policy gradient to best-response policies

02

Global convergence of the algorithm to Nash equilibrium

03

Robust performance across different graphon structures and noise levels

Abstract

Multi-agent reinforcement learning, despite its popularity and empirical success, faces significant scalability challenges in large-population dynamic games. Graphon mean field games (GMFGs) offer a principled framework for approximating such games while capturing heterogeneity among players. In this paper, we propose and analyze a policy optimization framework for continuous-time, finite-horizon linear-quadratic GMFGs. Exploiting the structural properties of GMFGs, we design an efficient policy parameterization in which each player's policy is represented as an affine function of their private state, with a shared slope function and player-specific intercepts. We develop a bilevel optimization algorithm that alternates between policy gradient updates for best-response computation under a fixed population distribution, and distribution updates using the resulting policies. We prove…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Adaptive Dynamic Programming Control