On the Analysis of Model-free Methods for the Linear Quadratic Regulator
Zeyu Jin, Johann Michael Schmitt, Zaiwen Wen

TL;DR
This paper analyzes the convergence and sample efficiency of model-free reinforcement learning algorithms like policy gradient, TD-learning, and actor-critic for the Linear Quadratic Regulator, providing theoretical insights into their performance.
Contribution
It offers the first convergence analysis for these algorithms in LQR, highlighting the benefits of actor-critic over policy gradient in terms of sample complexity.
Findings
Actor-critic reduces sample complexity compared to policy gradient.
Global linear convergence is established for several algorithms.
Preliminary analysis explains the advantages of actor-critic methods.
Abstract
Many reinforcement learning methods achieve great success in practice but lack theoretical foundation. In this paper, we study the convergence analysis on the problem of the Linear Quadratic Regulator (LQR). The global linear convergence properties and sample complexities are established for several popular algorithms such as the policy gradient algorithm, TD-learning and the actor-critic (AC) algorithm. Our results show that the actor-critic algorithm can reduce the sample complexity compared with the policy gradient algorithm. Although our analysis is still preliminary, it explains the benefit of AC algorithm in a certain sense.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Iterative Learning Control Systems
