Global optimality of softmax policy gradient with single hidden layer   neural networks in the mean-field regime

Andrea Agazzi; Jianfeng Lu

arXiv:2010.11858·cs.LG·October 23, 2020

Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime

Andrea Agazzi, Jianfeng Lu

PDF

Open Access 1 Video

TL;DR

This paper analyzes the training dynamics of wide single hidden layer neural networks with softmax policy gradient methods in reinforcement learning, proving global optimality of fixed points in the mean-field regime.

Contribution

It establishes the Wasserstein gradient flow framework for policy optimization and proves global optimality of fixed points under mild initialization conditions.

Findings

01

Wasserstein gradient flow describes policy training dynamics in the mean-field regime.

02

Global optimality of fixed points is proven under mild conditions.

03

Analysis applies to wide single hidden layer neural networks with entropy regularization.

Abstract

We study the problem of policy optimization for infinite-horizon discounted Markov Decision Processes with softmax policy and nonlinear function approximation trained with policy gradient algorithms. We concentrate on the training dynamics in the mean-field regime, modeling e.g., the behavior of wide single hidden layer neural networks, when exploration is encouraged through entropy regularization. The dynamics of these models is established as a Wasserstein gradient flow of distributions in parameter space. We further prove global optimality of the fixed points of this dynamics under mild conditions on their initialization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Reinforcement Learning in Robotics · Neural Networks and Applications

MethodsSoftmax