Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games
Antonio Ocello, Daniil Tiapkin, Lorenzo Mancini, Mathieu Lauri\`ere, Eric Moulines

TL;DR
This paper introduces MF-TRPO, an algorithm extending trust region policy optimization to mean-field games, providing convergence guarantees and finite-sample complexity analysis for finding approximate Nash equilibria in finite state-action spaces.
Contribution
It adapts TRPO to the mean-field game setting, offering the first theoretical convergence and sample complexity guarantees for this class of algorithms.
Findings
Proves convergence of MF-TRPO under standard assumptions.
Derives finite-sample complexity bounds for the algorithm.
Provides high-probability guarantees for sample-based implementation.
Abstract
We introduce Mean-Field Trust Region Policy Optimization (MF-TRPO), a novel algorithm designed to compute approximate Nash equilibria for ergodic Mean-Field Games (MFG) in finite state-action spaces. Building on the well-established performance of TRPO in the reinforcement learning (RL) setting, we extend its methodology to the MFG framework, leveraging its stability and robustness in policy optimization. Under standard assumptions in the MFG literature, we provide a rigorous analysis of MF-TRPO, establishing theoretical guarantees on its convergence. Our results cover both the exact formulation of the algorithm and its sample-based counterpart, where we derive high-probability guarantees and finite sample complexity. This work advances MFG optimization by bridging RL techniques with mean-field decision-making, offering a theoretically grounded approach to solving complex multi-agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Game Theory and Applications
MethodsTrust Region Policy Optimization
