Policy Gradient for Continuous-Time Mean-Field Control
Erhan Bayraktar, Martin Hernandez, Qinxin Yan, Yuhua Zhu

TL;DR
This paper introduces a novel policy gradient method for entropy-regularized mean-field control in continuous time, enabling efficient policy updates without solving additional equations.
Contribution
It derives an explicit policy gradient formula for mean-field control, facilitating a model-based actor-critic approach without extra PDE solutions.
Findings
Developed a Gâteaux policy-gradient formula for mean-field control.
Implemented a model-based actor-critic scheme using the formula.
Validated the approach on LQR and crowd-motion models.
Abstract
This paper develops a policy gradient method for entropy-regularized mean-field control in the discounted infinite-horizon setting. We consider randomized feedback policies and a coupled representative-particle/population system, in which the representative state evolves jointly with a population law governed by a McKean--Vlasov equation. The resulting value function is therefore defined on the product space . A key distinction from existing policy gradient methods for mean-field control is that, after computing the value function under a fixed policy, our approach does not require solving an additional equation to obtain the policy gradient. Instead, we derive an explicit policy gradient formula directly in terms of the value function. The formulation is based on an instantaneous advantage function, which quantifies the gain of taking a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
