Optimistic Online LQR via Intrinsic Rewards
Marcell Bartos, Bruce D. Lee, Lenart Treven, Andreas Krause, Florian D\"orfler, Melanie N. Zeilinger

TL;DR
This paper introduces IR-LQR, an efficient, optimistic online LQR algorithm that uses intrinsic rewards for uncertainty-driven exploration, achieving optimal regret rates in unknown linear dynamical systems.
Contribution
It proposes a simple, computationally cheap IR-LQR method that modifies the cost function with intrinsic rewards, contrasting with more complex existing approaches.
Findings
IR-LQR achieves the optimal worst-case regret rate of √T.
IR-LQR outperforms existing online LQR algorithms in numerical experiments.
IR-LQR maintains the standard LQR structure, simplifying implementation.
Abstract
Optimism in the face of uncertainty is a popular approach to balance exploration and exploitation in reinforcement learning. Here, we consider the online linear quadratic regulator (LQR) problem, i.e., to learn the LQR corresponding to an unknown linear dynamical system by adapting the control policy online based on closed-loop data collected during operation. In this work, we propose Intrinsic Rewards LQR (IR-LQR), an optimistic online LQR algorithm that applies the idea of intrinsic rewards originating from reinforcement learning and the concept of variance regularization to promote uncertainty-driven exploration. IR-LQR retains the structure of a standard LQR synthesis problem by only modifying the cost function, resulting in an intuitively pleasing, simple, computationally cheap, and efficient algorithm. This is in contrast to existing optimistic online LQR formulations that rely on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
