Logarithmic Regret for Online Control

Naman Agarwal; Elad Hazan; Karan Singh

arXiv:1909.05062·cs.LG·September 12, 2019·51 cites

Logarithmic Regret for Online Control

Naman Agarwal, Elad Hazan, Karan Singh

PDF

Open Access

TL;DR

This paper demonstrates that in linear control systems with adversarial costs, the optimal regret can be reduced from the traditional (T) to polylogarithmic in T, using efficient online algorithms.

Contribution

It establishes a significantly improved regret bound of polylogarithmic order for linear control with adversarial costs, and introduces two efficient algorithms to achieve this.

Findings

01

Optimal regret scales as polylogarithmic in T.

02

Two efficient algorithms, online gradient descent and natural gradient, attain this bound.

03

Traditional (T) regret bounds are surpassed by these methods.

Abstract

We study optimal regret bounds for control in linear dynamical systems under adversarially changing strongly convex cost functions, given the knowledge of transition dynamics. This includes several well studied and fundamental frameworks such as the Kalman filter and the linear quadratic regulator. State of the art methods achieve regret which scales as $O (T)$ , where $T$ is the time horizon. We show that the optimal regret in this setting can be significantly smaller, scaling as $O (poly (lo g T))$ . This regret bound is achieved by two different efficient iterative methods, online gradient descent and online natural gradient.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Adaptive Dynamic Programming Control