Convergence of Policy Gradient for Entropy Regularized MDPs with Neural Network Approximation in the Mean-Field Regime
Bekzhan Kerimkulov, James-Michael Leahy, David \v{S}i\v{s}ka and, Lukasz Szpruch

TL;DR
This paper proves that policy gradient methods with neural network approximation for entropy-regularized MDPs converge exponentially fast to the optimal policy in a mean-field regime, extending prior tabular results to continuous spaces.
Contribution
It establishes the global convergence of policy gradient with neural networks in the mean-field setting for continuous MDPs, including exponential convergence rates and sensitivity analysis.
Findings
Objective function increases along the gradient flow.
Gradient flow converges exponentially fast to the unique maximizer.
Sensitivity of the value function depends on regularization and initial conditions.
Abstract
We study the global convergence of policy gradient for infinite-horizon, continuous state and action space, and entropy-regularized Markov decision processes (MDPs). We consider a softmax policy with (one-hidden layer) neural network approximation in a mean-field regime. Additional entropic regularization in the associated mean-field probability measure is added, and the corresponding gradient flow is studied in the 2-Wasserstein metric. We show that the objective function is increasing along the gradient flow. Further, we prove that if the regularization in terms of the mean-field measure is sufficient, the gradient flow converges exponentially fast to the unique stationary solution, which is the unique maximizer of the regularized MDP objective. Lastly, we study the sensitivity of the value function along the gradient flow with respect to regularization parameters and the initial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques
MethodsSoftmax · Multi-partition Embedding Interaction
