Data-Enabled Policy Optimization for Direct Adaptive Learning of the LQR

Feiran Zhao; Florian D\"orfler; Alessandro Chiuso; Keyou You

arXiv:2401.14871·math.OC·October 7, 2024·1 cites

Data-Enabled Policy Optimization for Direct Adaptive Learning of the LQR

Feiran Zhao, Florian D\"orfler, Alessandro Chiuso, Keyou You

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel online data-driven method called DeePO for adaptive LQR control, providing theoretical guarantees, recursive updates, and demonstrating efficiency through simulations.

Contribution

It proposes a new policy parameterization and a direct policy optimization method for online LQR learning with proven convergence and regret bounds.

Findings

01

DeePO achieves sublinear regret of O(1/√T).

02

The method converges globally with explicit recursive updates.

03

Simulations validate efficiency and theoretical guarantees.

Abstract

Direct data-driven design methods for the linear quadratic regulator (LQR) mainly use offline or episodic data batches, and their online adaptation has been acknowledged as an open problem. In this paper, we propose a direct adaptive method to learn the LQR from online closed-loop data. First, we propose a new policy parameterization based on the sample covariance to formulate a direct data-driven LQR problem, which is shown to be equivalent to the certainty-equivalence LQR with optimal non-asymptotic guarantees. Second, we design a novel data-enabled policy optimization (DeePO) method to directly update the policy, where the gradient is explicitly computed using only a batch of persistently exciting (PE) data. Third, we establish its global convergence via a projected gradient dominance property. Importantly, we efficiently use DeePO to adaptively learn the LQR by performing only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

feiran-zhao-eth/policy-gradient-adaptive-control
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Machine Learning and ELM · Iterative Learning Control Systems