Logarithmic Regret for Online Control
Naman Agarwal, Elad Hazan, Karan Singh

TL;DR
This paper demonstrates that in linear control systems with adversarial costs, the optimal regret can be reduced from the traditional (T) to polylogarithmic in T, using efficient online algorithms.
Contribution
It establishes a significantly improved regret bound of polylogarithmic order for linear control with adversarial costs, and introduces two efficient algorithms to achieve this.
Findings
Optimal regret scales as polylogarithmic in T.
Two efficient algorithms, online gradient descent and natural gradient, attain this bound.
Traditional (T) regret bounds are surpassed by these methods.
Abstract
We study optimal regret bounds for control in linear dynamical systems under adversarially changing strongly convex cost functions, given the knowledge of transition dynamics. This includes several well studied and fundamental frameworks such as the Kalman filter and the linear quadratic regulator. State of the art methods achieve regret which scales as , where is the time horizon. We show that the optimal regret in this setting can be significantly smaller, scaling as . This regret bound is achieved by two different efficient iterative methods, online gradient descent and online natural gradient.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Adaptive Dynamic Programming Control
