Backpropagation through the Void: Optimizing control variates for black-box gradient estimation
Will Grathwohl, Dami Choi, Yuhuai Wu, Geoffrey Roeder, David Duvenaud

TL;DR
This paper presents a neural network-based framework for learning low-variance, unbiased gradient estimators applicable to black-box functions, improving optimization in settings like discrete latent models and reinforcement learning.
Contribution
It introduces a novel method for optimizing control variates to produce unbiased, low-variance gradient estimates for black-box functions, applicable to both discrete and continuous problems.
Findings
Effective in training discrete latent-variable models
Provides an unbiased, action-conditional extension of advantage actor-critic
Demonstrates improved gradient estimation in black-box optimization
Abstract
Gradient-based optimization is the foundation of deep learning and reinforcement learning. Even when the mechanism being optimized is unknown or not differentiable, optimization using high-variance or biased gradient estimates is still often the best strategy. We introduce a general framework for learning low-variance, unbiased gradient estimators for black-box functions of random variables. Our method uses gradients of a neural network trained jointly with model parameters or policies, and is applicable in both discrete and continuous settings. We demonstrate this framework for training discrete latent-variable models. We also give an unbiased, action-conditional extension of the advantage actor-critic reinforcement learning algorithm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Model Reduction and Neural Networks
