Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator
Stephen Tu, Benjamin Recht

TL;DR
This paper provides the first finite-time analysis of the sample complexity for Least-Squares Temporal Difference learning applied to the Linear Quadratic Regulator, advancing understanding of RL in continuous control tasks.
Contribution
It introduces a finite-sample analysis of LSTD for LQR, including a new eigenvalue concentration result for empirical covariance matrices in stochastic processes.
Findings
Finite-time bounds for LSTD in LQR
Eigenvalue concentration characterization for stochastic processes
Experimental validation of theoretical results
Abstract
Reinforcement learning (RL) has been successfully used to solve many continuous control tasks. Despite its impressive results however, fundamental questions regarding the sample complexity of RL on continuous problems remain open. We study the performance of RL in this setting by considering the behavior of the Least-Squares Temporal Difference (LSTD) estimator on the classic Linear Quadratic Regulator (LQR) problem from optimal control. We give the first finite-time analysis of the number of samples needed to estimate the value function for a fixed static state-feedback policy to within -relative error. In the process of deriving our result, we give a general characterization for when the minimum eigenvalue of the empirical covariance matrix formed along the sample path of a fast-mixing stochastic process concentrates above zero, extending a result by Koltchinskii and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Age of Information Optimization
