Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games

Antonio Ocello; Daniil Tiapkin; Lorenzo Mancini; Mathieu Lauri\`ere; Eric Moulines

arXiv:2505.22781·stat.ML·May 30, 2025

Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games

Antonio Ocello, Daniil Tiapkin, Lorenzo Mancini, Mathieu Lauri\`ere, Eric Moulines

PDF

Open Access

TL;DR

This paper introduces MF-TRPO, an algorithm extending trust region policy optimization to mean-field games, providing convergence guarantees and finite-sample complexity analysis for finding approximate Nash equilibria in finite state-action spaces.

Contribution

It adapts TRPO to the mean-field game setting, offering the first theoretical convergence and sample complexity guarantees for this class of algorithms.

Findings

01

Proves convergence of MF-TRPO under standard assumptions.

02

Derives finite-sample complexity bounds for the algorithm.

03

Provides high-probability guarantees for sample-based implementation.

Abstract

We introduce Mean-Field Trust Region Policy Optimization (MF-TRPO), a novel algorithm designed to compute approximate Nash equilibria for ergodic Mean-Field Games (MFG) in finite state-action spaces. Building on the well-established performance of TRPO in the reinforcement learning (RL) setting, we extend its methodology to the MFG framework, leveraging its stability and robustness in policy optimization. Under standard assumptions in the MFG literature, we provide a rigorous analysis of MF-TRPO, establishing theoretical guarantees on its convergence. Our results cover both the exact formulation of the algorithm and its sample-based counterpart, where we derive high-probability guarantees and finite sample complexity. This work advances MFG optimization by bridging RL techniques with mean-field decision-making, offering a theoretically grounded approach to solving complex multi-agent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Game Theory and Applications

MethodsTrust Region Policy Optimization