A Stochastic Gradient Descent Approach to Design Policy Gradient Methods for LQR

Bowen Song; Simon Weissmann; Mathias Staudigl; Andrea Iannelli

arXiv:2602.18933·eess.SY·February 24, 2026

A Stochastic Gradient Descent Approach to Design Policy Gradient Methods for LQR

Bowen Song, Simon Weissmann, Mathias Staudigl, Andrea Iannelli

PDF

Open Access

TL;DR

This paper develops a stochastic gradient descent framework for designing policy gradient algorithms in LQR problems, analyzing convergence with two data-driven gradient estimation schemes and validating with numerical experiments.

Contribution

It introduces a novel SGD-based approach for LQR policy design using two gradient estimation methods and provides convergence analysis for biased stochastic oracles.

Findings

01

Both gradient estimation schemes effectively converge to optimal policies.

02

The indirect approach estimates system matrices before gradient computation.

03

The direct approach approximates gradients via empirical cost evaluations.

Abstract

In this work, we propose a stochastic gradient descent (SGD) framework to design data-driven policy gradient descent algorithms for the linear quadratic regulator problem. Two alternative schemes are considered to estimate the policy gradient from stochastic trajectory data: (i) an indirect online identification based approach, in which the system matrices are first estimated and subsequently used to construct the gradient, and (ii) a direct zeroth-order approach, which approximates the gradient using empirical cost evaluations. In both cases, the resulting gradient estimates are random due to stochasticity in the data, allowing us to use SGD theory to analyze the convergence of the associated policy gradient methods. A key technical step consists of modeling the gradient estimates as suitable stochastic gradient oracles, which, because of the way they are computed, are inherently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Model Reduction and Neural Networks