A homotopic approach to policy gradients for linear quadratic regulators with nonlinear controls
Craig Xu Chen, Andrea Agazzi

TL;DR
This paper introduces a homotopic policy gradient method that gradually increases the discount factor, enabling convergence to the global optimum in nonlinear policy spaces for the LQR problem.
Contribution
It proposes a novel homotopic approach to policy gradients that overcomes local minima issues in nonlinear policy classes for LQR.
Findings
Homotopic policy gradient converges to the global optimum for nonlinear policies.
Counterexample shows linear policy extension can lead to local minima.
Method applies to a large class of Lipschitz, nonlinear policies.
Abstract
We study the convergence of deterministic policy gradient algorithms in continuous state and action space for the prototypical Linear Quadratic Regulator (LQR) problem when the search space is not limited to the family of linear policies. We first provide a counterexample showing that extending the policy class to piecewise linear functions results in local minima of the policy gradient algorithm. To solve this problem, we develop a new approach that involves sequentially increasing a discount factor between iterations of the original policy gradient algorithm. We finally prove that this homotopic variant of policy gradient methods converges to the global optimum of the undiscounted Linear Quadratic Regulator problem for a large class of Lipschitz, non-linear policies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematical Biology Tumor Growth
