An Analysis of Measure-Valued Derivatives for Policy Gradients

Joao Carvalho; Jan Peters

arXiv:2203.03917·cs.LG·March 9, 2022

An Analysis of Measure-Valued Derivatives for Policy Gradients

Joao Carvalho, Jan Peters

PDF

Open Access

TL;DR

This paper explores the Measure-Valued Derivative as a low-variance, unbiased gradient estimator for policy gradients in reinforcement learning, applicable to both differentiable and non-differentiable function approximators.

Contribution

It introduces and empirically evaluates the Measure-Valued Derivative estimator as a versatile alternative to likelihood-ratio and reparametrization tricks in policy gradient methods.

Findings

01

Achieves comparable performance to existing methods in various action space dimensions.

02

Can be used with non-differentiable function approximators.

03

Offers low variance and unbiased gradient estimates.

Abstract

Reinforcement learning methods for robotics are increasingly successful due to the constant development of better policy gradient techniques. A precise (low variance) and accurate (low bias) gradient estimator is crucial to face increasingly complex tasks. Traditional policy gradient algorithms use the likelihood-ratio trick, which is known to produce unbiased but high variance estimates. More modern approaches exploit the reparametrization trick, which gives lower variance gradient estimates but requires differentiable value function approximators. In this work, we study a different type of stochastic gradient estimator - the Measure-Valued Derivative. This estimator is unbiased, has low variance, and can be used with differentiable and non-differentiable function approximators. We empirically evaluate this estimator in the actor-critic policy gradient setting and show that it can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics