Data-Enabled Policy Optimization for Direct Adaptive Learning of the LQR
Feiran Zhao, Florian D\"orfler, Alessandro Chiuso, Keyou You

TL;DR
This paper introduces a novel online data-driven method called DeePO for adaptive LQR control, providing theoretical guarantees, recursive updates, and demonstrating efficiency through simulations.
Contribution
It proposes a new policy parameterization and a direct policy optimization method for online LQR learning with proven convergence and regret bounds.
Findings
DeePO achieves sublinear regret of O(1/√T).
The method converges globally with explicit recursive updates.
Simulations validate efficiency and theoretical guarantees.
Abstract
Direct data-driven design methods for the linear quadratic regulator (LQR) mainly use offline or episodic data batches, and their online adaptation has been acknowledged as an open problem. In this paper, we propose a direct adaptive method to learn the LQR from online closed-loop data. First, we propose a new policy parameterization based on the sample covariance to formulate a direct data-driven LQR problem, which is shown to be equivalent to the certainty-equivalence LQR with optimal non-asymptotic guarantees. Second, we design a novel data-enabled policy optimization (DeePO) method to directly update the policy, where the gradient is explicitly computed using only a batch of persistently exciting (PE) data. Third, we establish its global convergence via a projected gradient dominance property. Importantly, we efficiently use DeePO to adaptively learn the LQR by performing only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Machine Learning and ELM · Iterative Learning Control Systems
