Derivative-Free Methods for Policy Optimization: Guarantees for Linear   Quadratic Systems

Dhruv Malik; Ashwin Pananjady; Kush Bhatia; Koulik Khamaru; Peter L.; Bartlett; Martin J. Wainwright

arXiv:1812.08305·cs.LG·May 19, 2020·88 cites

Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

Dhruv Malik, Ashwin Pananjady, Kush Bhatia, Koulik Khamaru, Peter L., Bartlett, Martin J. Wainwright

PDF

Open Access

TL;DR

This paper analyzes derivative-free policy optimization methods for linear-quadratic systems, providing convergence guarantees and exploring different noise and feedback settings, supported by theoretical analysis and simulations.

Contribution

It offers the first explicit polynomial convergence guarantees for zero-order methods in linear-quadratic control, considering various noise and feedback scenarios.

Findings

01

Methods converge within any desired accuracy with polynomial complexity.

02

Different noise and feedback settings significantly affect convergence behavior.

03

Theoretical results are validated through extensive simulations.

Abstract

We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of these methods when applied to linear-quadratic systems, and study various settings of driving noise and reward feedback. We show that these methods provably converge to within any pre-specified tolerance of the optimal policy with a number of zero-order evaluations that is an explicit polynomial of the error tolerance, dimension, and curvature properties of the problem. Our analysis reveals some interesting differences between the settings of additive driving noise and random initialization, as well as the settings of one-point and two-point reward feedback. Our theory is corroborated by extensive simulations of derivative-free methods on these systems. Along the way, we derive convergence rates for stochastic zero-order optimization algorithms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic processes and financial applications · Stochastic Gradient Optimization Techniques · Risk and Portfolio Optimization