Direct Data-Driven Linear Quadratic Tracking via Policy Optimization
Shubo Kang, Keyou You

TL;DR
This paper extends data-driven optimal control methods to Linear Quadratic Tracking by introducing a reference-decoupled reformulation, enabling fixed-dimension decision variables and efficient policy optimization.
Contribution
It proposes a novel reformulation of LQT that allows covariance parameterization with fixed decision dimension and develops convergent offline and online DeePO algorithms.
Findings
The reformulation guarantees equivalence to classical LQT solutions.
The offline DeePO algorithm converges linearly under certain conditions.
The online algorithm's optimality gap decreases linearly, influenced by SNR.
Abstract
Direct data-driven optimal control provides an elegant end-to-end paradigm, yet its real-time applicability is often hindered by the growing dimensionality of online decision variables. Recent breakthroughs, notably Data-EnablEd Policy Optimization (DeePO), overcome this bottleneck for the Linear Quadratic Regulator (LQR) through sample-covariance parameterization; however, extending this paradigm to Linear Quadratic Tracking (LQT) poses a fundamental challenge. The core difficulty stems from the intricate coupling between time-varying references and the feedback-feedforward policy structure, which prevents a direct application of constant-dimension parameterization. We first introduce a reference-decoupled reformulation of LQT that naturally accommodates the covariance parameterization, guaranteeing a fixed dimension of decision variables independent of data horizon. This formulation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
