Online Policy Gradient for Model Free Learning of Linear Quadratic   Regulators with $\sqrt{T}$ Regret

Asaf Cassel (1); Tomer Koren ((1) School of Computer Science; Tel Aviv; University)

arXiv:2102.12608·cs.LG·February 26, 2021·1 cites

Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with $\sqrt{T}$ Regret

Asaf Cassel (1), Tomer Koren ((1) School of Computer Science, Tel Aviv, University)

PDF

Open Access

TL;DR

This paper introduces a model-free policy gradient algorithm for linear quadratic regulators that achieves near-optimal regret scaling with the time horizon, eliminating the need for costly system identification.

Contribution

It presents the first model-free approach with regret bounds comparable to model-based methods for LQR control, using a novel analysis of exploration costs.

Findings

01

Achieves t regret scaling with t horizon T.

02

Introduces an efficient policy gradient method for LQR control.

03

Provides a tighter analysis of exploration costs in policy space.

Abstract

We consider the task of learning to control a linear dynamical system under fixed quadratic costs, known as the Linear Quadratic Regulator (LQR) problem. While model-free approaches are often favorable in practice, thus far only model-based methods, which rely on costly system identification, have been shown to achieve regret that scales with the optimal dependence on the time horizon T. We present the first model-free algorithm that achieves similar regret guarantees. Our method relies on an efficient policy gradient scheme, and a novel and tighter analysis of the cost of exploration in policy space in this setting.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Model Reduction and Neural Networks