Towards a Theoretical Foundation of Policy Optimization for Learning   Control Policies

Bin Hu; Kaiqing Zhang; Na Li; Mehran Mesbahi; Maryam Fazel; Tamer; Ba\c{s}ar

arXiv:2210.04810·math.OC·October 11, 2022·6 cites

Towards a Theoretical Foundation of Policy Optimization for Learning Control Policies

Bin Hu, Kaiqing Zhang, Na Li, Mehran Mesbahi, Maryam Fazel, Tamer, Ba\c{s}ar

PDF

Open Access

TL;DR

This paper reviews recent theoretical advances in policy optimization methods for control and reinforcement learning, focusing on convergence, landscape, and robustness in continuous control problems.

Contribution

It provides an interdisciplinary survey connecting control theory, reinforcement learning, and optimization, highlighting new theoretical insights into policy optimization.

Findings

01

Analysis of optimization landscape and global convergence for control policies

02

Results on sample complexity in continuous control tasks

03

Discussion on stability and robustness in learning-based control

Abstract

Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis, popularized by successes of reinforcement learning. We take an interdisciplinary perspective in our exposition that connects control theory, reinforcement learning, and large-scale optimization. We review a number of recently-developed theoretical results on the optimization landscape, global convergence, and sample complexity of gradient-based methods for various continuous control problems such as the linear quadratic regulator (LQR), $H_{\infty}$ control, risk-sensitive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Advanced Fluorescence Microscopy Techniques · Stochastic Gradient Optimization Techniques