On the Analysis of Model-free Methods for the Linear Quadratic Regulator

Zeyu Jin; Johann Michael Schmitt; Zaiwen Wen

arXiv:2007.03861·math.OC·July 9, 2020·5 cites

On the Analysis of Model-free Methods for the Linear Quadratic Regulator

Zeyu Jin, Johann Michael Schmitt, Zaiwen Wen

PDF

Open Access

TL;DR

This paper analyzes the convergence and sample efficiency of model-free reinforcement learning algorithms like policy gradient, TD-learning, and actor-critic for the Linear Quadratic Regulator, providing theoretical insights into their performance.

Contribution

It offers the first convergence analysis for these algorithms in LQR, highlighting the benefits of actor-critic over policy gradient in terms of sample complexity.

Findings

01

Actor-critic reduces sample complexity compared to policy gradient.

02

Global linear convergence is established for several algorithms.

03

Preliminary analysis explains the advantages of actor-critic methods.

Abstract

Many reinforcement learning methods achieve great success in practice but lack theoretical foundation. In this paper, we study the convergence analysis on the problem of the Linear Quadratic Regulator (LQR). The global linear convergence properties and sample complexities are established for several popular algorithms such as the policy gradient algorithm, TD-learning and the actor-critic (AC) algorithm. Our results show that the actor-critic algorithm can reduce the sample complexity compared with the policy gradient algorithm. Although our analysis is still preliminary, it explains the benefit of AC algorithm in a certain sense.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Iterative Learning Control Systems