Policy Optimization for Continuous-time Linear-Quadratic Graphon Mean Field Games
Philipp Plank, Yufei Zhang

TL;DR
This paper introduces a scalable policy optimization method for continuous-time linear-quadratic graphon mean field games, providing convergence guarantees and demonstrating robustness through numerical experiments.
Contribution
It develops a bilevel policy gradient algorithm tailored for GMFGs with theoretical convergence guarantees and practical effectiveness.
Findings
Linear convergence of policy gradient to best-response policies
Global convergence of the algorithm to Nash equilibrium
Robust performance across different graphon structures and noise levels
Abstract
Multi-agent reinforcement learning, despite its popularity and empirical success, faces significant scalability challenges in large-population dynamic games. Graphon mean field games (GMFGs) offer a principled framework for approximating such games while capturing heterogeneity among players. In this paper, we propose and analyze a policy optimization framework for continuous-time, finite-horizon linear-quadratic GMFGs. Exploiting the structural properties of GMFGs, we design an efficient policy parameterization in which each player's policy is represented as an affine function of their private state, with a shared slope function and player-specific intercepts. We develop a bilevel optimization algorithm that alternates between policy gradient updates for best-response computation under a fixed population distribution, and distribution updates using the resulting policies. We prove…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Adaptive Dynamic Programming Control
