Least-Squares Temporal Difference Learning for the Linear Quadratic   Regulator

Stephen Tu; Benjamin Recht

arXiv:1712.08642·cs.LG·December 27, 2017·27 cites

Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator

Stephen Tu, Benjamin Recht

PDF

Open Access

TL;DR

This paper provides the first finite-time analysis of the sample complexity for Least-Squares Temporal Difference learning applied to the Linear Quadratic Regulator, advancing understanding of RL in continuous control tasks.

Contribution

It introduces a finite-sample analysis of LSTD for LQR, including a new eigenvalue concentration result for empirical covariance matrices in stochastic processes.

Findings

01

Finite-time bounds for LSTD in LQR

02

Eigenvalue concentration characterization for stochastic processes

03

Experimental validation of theoretical results

Abstract

Reinforcement learning (RL) has been successfully used to solve many continuous control tasks. Despite its impressive results however, fundamental questions regarding the sample complexity of RL on continuous problems remain open. We study the performance of RL in this setting by considering the behavior of the Least-Squares Temporal Difference (LSTD) estimator on the classic Linear Quadratic Regulator (LQR) problem from optimal control. We give the first finite-time analysis of the number of samples needed to estimate the value function for a fixed static state-feedback policy to within $ε$ -relative error. In the process of deriving our result, we give a general characterization for when the minimum eigenvalue of the empirical covariance matrix formed along the sample path of a fast-mixing stochastic process concentrates above zero, extending a result by Koltchinskii and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Age of Information Optimization