Rate-matching the regret lower-bound in the linear quadratic regulator with unknown dynamics
Feicheng Wang, Lucas Janson

TL;DR
This paper improves the theoretical understanding of reinforcement learning in linear quadratic regulators with unknown dynamics by establishing a matching regret upper-bound of O(√T), closing the gap with the lower-bound.
Contribution
It introduces a novel regret upper-bound of O(√T) and provides a constructive proof analyzing a specific algorithm, matching the known lower-bound rate.
Findings
Established a regret upper-bound of O(√T) for the LQR with unknown dynamics.
Provided an estimation error bound on system dynamics of O(T^{-1/4}).
Enhanced proof techniques with precise bounds on the Gram matrix and a self-bounding argument.
Abstract
The theory of reinforcement learning currently suffers from a mismatch between its empirical performance and the theoretical characterization of its performance, with consequences for, e.g., the understanding of sample efficiency, safety, and robustness. The linear quadratic regulator with unknown dynamics is a fundamental reinforcement learning setting with significant structure in its dynamics and cost function, yet even in this setting there is a gap between the best known regret lower-bound of and the best known upper-bound of . The contribution of this paper is to close that gap by establishing a novel regret upper-bound of . Our proof is constructive in that it analyzes the regret of a concrete algorithm, and simultaneously establishes an estimation error bound on the dynamics of which is also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Receptor Mechanisms and Signaling
