Sample-Efficient Model-Free Policy Gradient Methods for Stochastic LQR via Robust Linear Regression

Bowen Song; Sebastien Gros; Andrea Iannelli

arXiv:2512.03764·eess.SY·May 11, 2026

Sample-Efficient Model-Free Policy Gradient Methods for Stochastic LQR via Robust Linear Regression

Bowen Song, Sebastien Gros, Andrea Iannelli

PDF

TL;DR

This paper introduces robust linear regression-based policy gradient algorithms for stochastic LQR, achieving sample-efficient convergence guarantees and validated by numerical experiments.

Contribution

It develops a primal-dual estimation scheme for unbiased gradient estimates, enabling efficient policy optimization in unknown stochastic linear systems.

Findings

01

Convergence guarantees with sample complexity O(1/epsilon)

02

Effective policy gradient algorithms for stochastic LQR

03

Numerical experiments confirm theoretical results

Abstract

Policy gradient algorithms are widely used in reinforcement learning and belong to the class of approximate dynamic programming methods. This paper studies two key policy gradient algorithms, the Natural Policy Gradient and the Gauss-Newton Method, for solving the Linear Quadratic Regulator (LQR) problem in unknown stochastic linear systems. The main challenge lies in obtaining an unbiased gradient estimate from noisy data due to errors-in-variables in linear regression. This issue is addressed by employing a primal-dual estimation procedure. Using this novel gradient estimation scheme, the paper establishes convergence guarantees with a sample complexity of order O(1/epsilon). Theoretical results are further supported by numerical experiments, which demonstrate the effectiveness of the proposed algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.