Variance Reduction for Score Functions Using Optimal Baselines

Ronan Keane; H. Oliver Gao

arXiv:2212.13587·cs.LG·December 29, 2022

Variance Reduction for Score Functions Using Optimal Baselines

Ronan Keane, H. Oliver Gao

PDF

Open Access 1 Repo

TL;DR

This paper derives the optimal state-dependent baseline for variance reduction in score function gradient estimators, providing theoretical insights into their effectiveness in reinforcement learning.

Contribution

It introduces the first expression for the optimal baseline that minimizes variance in score function estimators, and compares it with value function baselines.

Findings

01

Optimal baseline can significantly outperform value function baseline in some cases.

02

Value function baseline generally performs similarly to the optimal baseline in variance reduction.

03

Using the value function for bootstrapping further reduces variance.

Abstract

Many problems involve the use of models which learn probability distributions or incorporate randomness in some way. In such problems, because computing the true expected gradient may be intractable, a gradient estimator is used to update the model parameters. When the model parameters directly affect a probability distribution, the gradient estimator will involve score function terms. This paper studies baselines, a variance reduction technique for score functions. Motivated primarily by reinforcement learning, we derive for the first time an expression for the optimal state-dependent baseline, the baseline which results in a gradient estimator with minimum variance. Although we show that there exist examples where the optimal baseline may be arbitrarily better than a value function baseline, we find that the value function baseline usually performs similarly to an optimal baseline in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ronan-keane/ppo-tf2
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Machine Learning and Algorithms