Minimal Expected Regret in Linear Quadratic Control

Yassir Jedra; Alexandre Proutiere

arXiv:2109.14429·cs.LG·September 30, 2021

Minimal Expected Regret in Linear Quadratic Control

Yassir Jedra, Alexandre Proutiere

PDF

Open Access

TL;DR

This paper introduces an online learning algorithm for Linear Quadratic Control with unknown system matrices, achieving near-optimal regret bounds that adapt to different levels of system knowledge and allowing frequent policy updates.

Contribution

The paper presents a simple, constantly-updated certainty-equivalence control algorithm with provable regret bounds, improving upon epoch-based methods and matching lower bounds in key scenarios.

Findings

01

Regret bounds scale optimally with time and system dimensions.

02

Algorithm allows frequent updates, improving analysis and performance.

03

Proves near-optimal regret in multiple unknown system scenarios.

Abstract

We consider the problem of online learning in Linear Quadratic Control systems whose state transition and state-action transition matrices $A$ and $B$ may be initially unknown. We devise an online learning algorithm and provide guarantees on its expected regret. This regret at time $T$ is upper bounded (i) by $O ((d_{u} + d_{x}) d_{x} T)$ when $A$ and $B$ are unknown, (ii) by $O (d_{x}^{2} lo g (T))$ if only $A$ is unknown, and (iii) by $O (d_{x} (d_{u} + d_{x}) lo g (T))$ if only $B$ is unknown and under some mild non-degeneracy condition ( $d_{x}$ and $d_{u}$ denote the dimensions of the state and of the control input, respectively). These regret scalings are minimal in $T$ , $d_{x}$ and $d_{u}$ as they match existing lower bounds in scenario (i) when $d_{x} \leq d_{u}$ [SF20], and in scenario (ii) [lai1986]. We conjecture that our upper bounds are also optimal in scenario (iii) (there…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Receptor Mechanisms and Signaling · Reinforcement Learning in Robotics