Convergence of Policy Gradient for Entropy Regularized MDPs with Neural   Network Approximation in the Mean-Field Regime

Bekzhan Kerimkulov; James-Michael Leahy; David \v{S}i\v{s}ka and; Lukasz Szpruch

arXiv:2201.07296·math.OC·June 17, 2022·1 cites

Convergence of Policy Gradient for Entropy Regularized MDPs with Neural Network Approximation in the Mean-Field Regime

Bekzhan Kerimkulov, James-Michael Leahy, David \v{S}i\v{s}ka and, Lukasz Szpruch

PDF

Open Access

TL;DR

This paper proves that policy gradient methods with neural network approximation for entropy-regularized MDPs converge exponentially fast to the optimal policy in a mean-field regime, extending prior tabular results to continuous spaces.

Contribution

It establishes the global convergence of policy gradient with neural networks in the mean-field setting for continuous MDPs, including exponential convergence rates and sensitivity analysis.

Findings

01

Objective function increases along the gradient flow.

02

Gradient flow converges exponentially fast to the unique maximizer.

03

Sensitivity of the value function depends on regularization and initial conditions.

Abstract

We study the global convergence of policy gradient for infinite-horizon, continuous state and action space, and entropy-regularized Markov decision processes (MDPs). We consider a softmax policy with (one-hidden layer) neural network approximation in a mean-field regime. Additional entropic regularization in the associated mean-field probability measure is added, and the corresponding gradient flow is studied in the 2-Wasserstein metric. We show that the objective function is increasing along the gradient flow. Further, we prove that if the regularization in terms of the mean-field measure is sufficient, the gradient flow converges exponentially fast to the unique stationary solution, which is the unique maximizer of the regularized MDP objective. Lastly, we study the sensitivity of the value function along the gradient flow with respect to regularization parameters and the initial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques

MethodsSoftmax · Multi-partition Embedding Interaction