Linear convergence of a policy gradient method for some finite horizon continuous time control problems
Christoph Reisinger, Wolfgang Stockinger, Yufei Zhang

TL;DR
This paper establishes linear convergence of a policy gradient algorithm for certain finite-horizon stochastic control problems with nonlinear dynamics, providing theoretical support for practical reinforcement learning heuristics.
Contribution
It introduces proximal gradient algorithms for nonlinear stochastic control problems and proves their linear convergence to stationary points under specific conditions.
Findings
Proximal gradient algorithms converge linearly to stationary points.
Convergence is stable under approximate policy updates.
Results justify heuristics like entropy regularization in reinforcement learning.
Abstract
Despite its popularity in the reinforcement learning community, a provably convergent policy gradient method for continuous space-time control problems with nonlinear state dynamics has been elusive. This paper proposes proximal gradient algorithms for feedback controls of finite-time horizon stochastic control problems. The state dynamics are nonlinear diffusions with control-affine drift, and the cost functions are nonconvex in the state and nonsmooth in the control. The system noise can degenerate, which allows for deterministic control problems as special cases. We prove under suitable conditions that the algorithm converges linearly to a stationary point of the control problem, and is stable with respect to policy updates by approximate gradient steps. The convergence result justifies the recent reinforcement learning heuristics that adding entropy regularization or a fictitious…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Advanced Bandit Algorithms Research · Advanced Control Systems Optimization
MethodsEntropy Regularization
