Variance Reduction for Score Functions Using Optimal Baselines
Ronan Keane, H. Oliver Gao

TL;DR
This paper derives the optimal state-dependent baseline for variance reduction in score function gradient estimators, providing theoretical insights into their effectiveness in reinforcement learning.
Contribution
It introduces the first expression for the optimal baseline that minimizes variance in score function estimators, and compares it with value function baselines.
Findings
Optimal baseline can significantly outperform value function baseline in some cases.
Value function baseline generally performs similarly to the optimal baseline in variance reduction.
Using the value function for bootstrapping further reduces variance.
Abstract
Many problems involve the use of models which learn probability distributions or incorporate randomness in some way. In such problems, because computing the true expected gradient may be intractable, a gradient estimator is used to update the model parameters. When the model parameters directly affect a probability distribution, the gradient estimator will involve score function terms. This paper studies baselines, a variance reduction technique for score functions. Motivated primarily by reinforcement learning, we derive for the first time an expression for the optimal state-dependent baseline, the baseline which results in a gradient estimator with minimum variance. Although we show that there exist examples where the optimal baseline may be arbitrarily better than a value function baseline, we find that the value function baseline usually performs similarly to an optimal baseline in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Machine Learning and Algorithms
