Analyzing the Variance of Policy Gradient Estimators for the   Linear-Quadratic Regulator

James A. Preiss; S\'ebastien M. R. Arnold; Chen-Yu Wei; Marius Kloft

arXiv:1910.01249·cs.LG·October 4, 2019·5 cites

Analyzing the Variance of Policy Gradient Estimators for the Linear-Quadratic Regulator

James A. Preiss, S\'ebastien M. R. Arnold, Chen-Yu Wei, Marius Kloft

PDF

Open Access

TL;DR

This paper investigates the variance of the REINFORCE policy gradient estimator in linear-quadratic environments, deriving bounds and validating them through simulations to improve understanding of estimator behavior.

Contribution

It provides the first theoretical bounds on the variance of policy gradient estimators in linear-quadratic settings, supported by empirical validation.

Findings

01

Derived bounds on estimator variance based on environment parameters

02

Empirical results confirm the accuracy of theoretical predictions

03

Insights into variance behavior in continuous linear-quadratic control environments

Abstract

We study the variance of the REINFORCE policy gradient estimator in environments with continuous state and action spaces, linear dynamics, quadratic cost, and Gaussian noise. These simple environments allow us to derive bounds on the estimator variance in terms of the environment and noise parameters. We compare the predictions of our bounds to the empirical variance in simulation experiments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Markov Chains and Monte Carlo Methods · Simulation Techniques and Applications

MethodsREINFORCE