Naive Exploration is Optimal for Online LQR

Max Simchowitz; Dylan J. Foster

arXiv:2001.09576·cs.LG·October 5, 2023·33 cites

Naive Exploration is Optimal for Online LQR

Max Simchowitz, Dylan J. Foster

PDF

Open Access 1 Video

TL;DR

This paper proves that naive exploration with certainty equivalent control is optimal for online LQR, establishing tight bounds on regret that depend on system dimensions and ruling out polylogarithmic regret algorithms.

Contribution

It introduces new upper and lower bounds for online LQR regret, demonstrating the optimality of a simple exploration strategy and developing the self-bounding ODE method for Riccati equations.

Findings

01

Optimal regret scales as rac{rac{d_{u}^2 d_{x} T}

02

Lower bounds exclude polylogarithmic regret algorithms

03

Simple certainty equivalent control with exploration is optimal

Abstract

We consider the problem of online adaptive control of the linear quadratic regulator, where the true system parameters are unknown. We prove new upper and lower bounds demonstrating that the optimal regret scales as $Θ (d_{u}^{2} d_{x} T)$ , where $T$ is the number of time steps, $d_{u}$ is the dimension of the input space, and $d_{x}$ is the dimension of the system state. Notably, our lower bounds rule out the possibility of a $poly (lo g T)$ -regret algorithm, which had been conjectured due to the apparent strong convexity of the problem. Our upper bound is attained by a simple variant of $certainty equivalent control$ , where the learner selects control inputs according to the optimal controller for their estimate of the system while injecting exploratory random noise. While this approach was shown to achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Naive Exploration is Optimal for Online LQR· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Model Reduction and Neural Networks · Advanced Adaptive Filtering Techniques