Naive Exploration is Optimal for Online LQR
Max Simchowitz, Dylan J. Foster

TL;DR
This paper proves that naive exploration with certainty equivalent control is optimal for online LQR, establishing tight bounds on regret that depend on system dimensions and ruling out polylogarithmic regret algorithms.
Contribution
It introduces new upper and lower bounds for online LQR regret, demonstrating the optimality of a simple exploration strategy and developing the self-bounding ODE method for Riccati equations.
Findings
Optimal regret scales as rac{rac{d_{u}^2 d_{x} T}
Lower bounds exclude polylogarithmic regret algorithms
Simple certainty equivalent control with exploration is optimal
Abstract
We consider the problem of online adaptive control of the linear quadratic regulator, where the true system parameters are unknown. We prove new upper and lower bounds demonstrating that the optimal regret scales as , where is the number of time steps, is the dimension of the input space, and is the dimension of the system state. Notably, our lower bounds rule out the possibility of a -regret algorithm, which had been conjectured due to the apparent strong convexity of the problem. Our upper bound is attained by a simple variant of , where the learner selects control inputs according to the optimal controller for their estimate of the system while injecting exploratory random noise. While this approach was shown to achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Model Reduction and Neural Networks · Advanced Adaptive Filtering Techniques
