Model-free policy gradient for discrete-time mean-field control
Matthieu Meunier, Huy\^en Pham, Christoph Reisinger

TL;DR
This paper introduces a novel model-free policy gradient method for discrete-time mean-field control problems with finite states, overcoming challenges posed by population-dependent dynamics.
Contribution
It proposes a new perturbation scheme and the MF-REINFORCE algorithm, enabling policy learning without explicit models of the environment.
Findings
The gradient estimator converges to the true policy gradient as perturbation vanishes.
MF-REINFORCE achieves effective policy learning in mean-field control tasks.
Explicit bounds on bias and mean-squared error are established.
Abstract
We study model-free policy learning for discrete-time mean-field control (MFC) problems with finite state space and compact action space. In contrast to the extensive literature on value-based methods for MFC, policy-based approaches remain largely unexplored due to the intrinsic dependence of transition kernels and rewards on the evolving population state distribution, which prevents the direct use of likelihood-ratio estimators of policy gradients from classical single-agent reinforcement learning. We introduce a novel perturbation scheme on the state-distribution flow and prove that the gradient of the resulting perturbed value function converges to the true policy gradient as the perturbation magnitude vanishes. This construction yields a fully model-free estimator based solely on simulated trajectories and an auxiliary estimate of the sensitivity of the state distribution. Building…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Model Reduction and Neural Networks
