Minimal Expected Regret in Linear Quadratic Control
Yassir Jedra, Alexandre Proutiere

TL;DR
This paper introduces an online learning algorithm for Linear Quadratic Control with unknown system matrices, achieving near-optimal regret bounds that adapt to different levels of system knowledge and allowing frequent policy updates.
Contribution
The paper presents a simple, constantly-updated certainty-equivalence control algorithm with provable regret bounds, improving upon epoch-based methods and matching lower bounds in key scenarios.
Findings
Regret bounds scale optimally with time and system dimensions.
Algorithm allows frequent updates, improving analysis and performance.
Proves near-optimal regret in multiple unknown system scenarios.
Abstract
We consider the problem of online learning in Linear Quadratic Control systems whose state transition and state-action transition matrices and may be initially unknown. We devise an online learning algorithm and provide guarantees on its expected regret. This regret at time is upper bounded (i) by when and are unknown, (ii) by if only is unknown, and (iii) by if only is unknown and under some mild non-degeneracy condition ( and denote the dimensions of the state and of the control input, respectively). These regret scalings are minimal in , and as they match existing lower bounds in scenario (i) when [SF20], and in scenario (ii) [lai1986]. We conjecture that our upper bounds are also optimal in scenario (iii) (there…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Receptor Mechanisms and Signaling · Reinforcement Learning in Robotics
