Policy Gradient for Continuous-Time Mean-Field Control

Erhan Bayraktar; Martin Hernandez; Qinxin Yan; Yuhua Zhu

arXiv:2605.20718·math.OC·May 21, 2026

Policy Gradient for Continuous-Time Mean-Field Control

Erhan Bayraktar, Martin Hernandez, Qinxin Yan, Yuhua Zhu

PDF

TL;DR

This paper introduces a novel policy gradient method for entropy-regularized mean-field control in continuous time, enabling efficient policy updates without solving additional equations.

Contribution

It derives an explicit policy gradient formula for mean-field control, facilitating a model-based actor-critic approach without extra PDE solutions.

Findings

01

Developed a Gâteaux policy-gradient formula for mean-field control.

02

Implemented a model-based actor-critic scheme using the formula.

03

Validated the approach on LQR and crowd-motion models.

Abstract

This paper develops a policy gradient method for entropy-regularized mean-field control in the discounted infinite-horizon setting. We consider randomized feedback policies and a coupled representative-particle/population system, in which the representative state evolves jointly with a population law governed by a McKean--Vlasov equation. The resulting value function is therefore defined on the product space $R^{d} \times P_{2} (R^{d})$ . A key distinction from existing policy gradient methods for mean-field control is that, after computing the value function under a fixed policy, our approach does not require solving an additional equation to obtain the policy gradient. Instead, we derive an explicit policy gradient formula directly in terms of the value function. The formulation is based on an instantaneous advantage function, which quantifies the gain of taking a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.