Analyzing the Variance of Policy Gradient Estimators for the Linear-Quadratic Regulator
James A. Preiss, S\'ebastien M. R. Arnold, Chen-Yu Wei, Marius Kloft

TL;DR
This paper investigates the variance of the REINFORCE policy gradient estimator in linear-quadratic environments, deriving bounds and validating them through simulations to improve understanding of estimator behavior.
Contribution
It provides the first theoretical bounds on the variance of policy gradient estimators in linear-quadratic settings, supported by empirical validation.
Findings
Derived bounds on estimator variance based on environment parameters
Empirical results confirm the accuracy of theoretical predictions
Insights into variance behavior in continuous linear-quadratic control environments
Abstract
We study the variance of the REINFORCE policy gradient estimator in environments with continuous state and action spaces, linear dynamics, quadratic cost, and Gaussian noise. These simple environments allow us to derive bounds on the estimator variance in terms of the environment and noise parameters. We compare the predictions of our bounds to the empirical variance in simulation experiments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Markov Chains and Monte Carlo Methods · Simulation Techniques and Applications
MethodsREINFORCE
