Linear convergence of a policy gradient method for some finite horizon   continuous time control problems

Christoph Reisinger; Wolfgang Stockinger; Yufei Zhang

arXiv:2203.11758·math.OC·December 27, 2022

Linear convergence of a policy gradient method for some finite horizon continuous time control problems

Christoph Reisinger, Wolfgang Stockinger, Yufei Zhang

PDF

Open Access

TL;DR

This paper establishes linear convergence of a policy gradient algorithm for certain finite-horizon stochastic control problems with nonlinear dynamics, providing theoretical support for practical reinforcement learning heuristics.

Contribution

It introduces proximal gradient algorithms for nonlinear stochastic control problems and proves their linear convergence to stationary points under specific conditions.

Findings

01

Proximal gradient algorithms converge linearly to stationary points.

02

Convergence is stable under approximate policy updates.

03

Results justify heuristics like entropy regularization in reinforcement learning.

Abstract

Despite its popularity in the reinforcement learning community, a provably convergent policy gradient method for continuous space-time control problems with nonlinear state dynamics has been elusive. This paper proposes proximal gradient algorithms for feedback controls of finite-time horizon stochastic control problems. The state dynamics are nonlinear diffusions with control-affine drift, and the cost functions are nonconvex in the state and nonsmooth in the control. The system noise can degenerate, which allows for deterministic control problems as special cases. We prove under suitable conditions that the algorithm converges linearly to a stationary point of the control problem, and is stable with respect to policy updates by approximate gradient steps. The convergence result justifies the recent reinforcement learning heuristics that adding entropy regularization or a fictitious…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Advanced Bandit Algorithms Research · Advanced Control Systems Optimization

MethodsEntropy Regularization