Policy Gradient Adaptive Control for the LQR: Indirect and Direct Approaches
Feiran Zhao, Alessandro Chiuso, Florian D\"orfler

TL;DR
This paper introduces policy gradient adaptive control methods for the LQR that utilize online data to adaptively improve control policies, ensuring stability and convergence through both indirect and direct approaches, with enhanced variants like natural gradient and Gauss-Newton.
Contribution
It develops a unified framework for indirect and direct PGAC for LQR, incorporating natural gradient and Gauss-Newton methods, and provides stability, convergence, and robustness guarantees.
Findings
Proves stability and convergence of PGAC methods.
Demonstrates robustness and efficiency through simulations.
Introduces regularization to handle noise uncertainty.
Abstract
Motivated by recent advances of reinforcement learning and direct data-driven control, we propose policy gradient adaptive control (PGAC) for the linear quadratic regulator (LQR), which uses online closed-loop data to improve the control policy while maintaining stability. Our method adaptively updates the policy in feedback by descending the gradient of the LQR cost and is categorized as indirect, when gradients are computed via an estimated model, versus direct, when gradients are derived from data using sample covariance parameterization. Beyond the vanilla gradient, we also showcase the merits of the natural gradient and Gauss-Newton methods for the policy update. Notably, natural gradient descent bridges the indirect and direct PGAC, and the Gauss-Newton method of the indirect PGAC leads to an adaptive version of the celebrated Hewer's algorithm. To account for the uncertainty from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Optimization
