Second-Order Policy Gradient Methods for the Linear Quadratic Regulator

Amirreza Valaei; Arash Bahari Kordabad; and Sadegh Soudjani

arXiv:2511.02095·eess.SY·November 5, 2025

Second-Order Policy Gradient Methods for the Linear Quadratic Regulator

Amirreza Valaei, Arash Bahari Kordabad, and Sadegh Soudjani

PDF

Open Access

TL;DR

This paper introduces second-order policy gradient algorithms for the linear quadratic regulator, leveraging explicit Hessian formulas to accelerate convergence compared to traditional first-order methods.

Contribution

It derives explicit second-order formulas for LQR policy optimization, enabling faster convergence through Gauss-Newton and Newton methods.

Findings

01

Second-order methods converge faster than first-order baselines.

02

Explicit Hessian formulas are derived for LQR.

03

Numerical experiments confirm improved convergence rates.

Abstract

Policy gradient methods are a powerful family of reinforcement learning algorithms for continuous control that optimize a policy directly. However, standard first-order methods often converge slowly. Second-order methods can accelerate learning by using curvature information, but they are typically expensive to compute. The linear quadratic regulator (LQR) is a practical setting in which key quantities, such as the policy gradient, admit closed-form expressions. In this work, we develop second-order policy gradient algorithms for LQR by deriving explicit formulas for both the approximate and exact Hessians used in Gauss--Newton and Newton methods, respectively. Numerical experiments show a faster convergence rate for the proposed second-order approach over the standard first-order policy gradient baseline.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Model Reduction and Neural Networks