Finite-time Analysis of Approximate Policy Iteration for the Linear   Quadratic Regulator

Karl Krauth; Stephen Tu; Benjamin Recht

arXiv:1905.12842·cs.LG·May 31, 2019·22 cites

Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator

Karl Krauth, Stephen Tu, Benjamin Recht

PDF

Open Access

TL;DR

This paper provides a finite-time analysis of approximate policy iteration for the Linear Quadratic Regulator, highlighting the sample complexity of policy evaluation and proposing an adaptive exploration method with improved regret bounds.

Contribution

The paper quantifies the sample complexity of approximate policy iteration for LQR and introduces an adaptive $ extit{ extbf{ε}}$-greedy exploration procedure with better regret performance.

Findings

01

Policy evaluation dominates sample complexity with $(n+d)^3/ ext{ε}^2$ samples per step.

02

Only $ ext{log}(1/ ext{ε})$ policy improvement steps are needed.

03

The adaptive procedure achieves $T^{2/3}$ regret, improving previous results.

Abstract

We study the sample complexity of approximate policy iteration (PI) for the Linear Quadratic Regulator (LQR), building on a recent line of work using LQR as a testbed to understand the limits of reinforcement learning (RL) algorithms on continuous control tasks. Our analysis quantifies the tension between policy improvement and policy evaluation, and suggests that policy evaluation is the dominant factor in terms of sample complexity. Specifically, we show that to obtain a controller that is within $ε$ of the optimal LQR controller, each step of policy evaluation requires at most $(n + d)^{3} / ε^{2}$ samples, where $n$ is the dimension of the state vector and $d$ is the dimension of the input vector. On the other hand, only $lo g (1/ ε)$ policy improvement steps suffice, resulting in an overall sample complexity of $(n + d)^{3} ε^{- 2} lo g (1/ ε)$ . We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms