Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
Yoshua Bengio, Nicholas L\'eonard, Aaron Courville

TL;DR
This paper investigates methods for estimating gradients through stochastic neurons, comparing four approaches, and demonstrates their application in conditional computation to enable sparse, efficient neural networks.
Contribution
It introduces a new gradient estimator based on decomposing stochastic binary neurons and compares it with existing methods in the context of conditional computation.
Findings
The proposed decomposed estimator approximates the expected effect of stochastic neurons.
Straight-through estimator provides a heuristic but effective gradient approximation.
Conditional stochastic gating can significantly reduce computational costs in deep networks.
Abstract
Stochastic neurons and hard non-linearities can be useful for a number of reasons in deep learning models, but in many cases they pose a challenging problem: how to estimate the gradient of a loss function with respect to the input of such stochastic or non-smooth neurons? I.e., can we "back-propagate" through these stochastic neurons? We examine this question, existing approaches, and compare four families of solutions, applicable in different settings. One of them is the minimum variance unbiased gradient estimator for stochatic binary neurons (a special case of the REINFORCE algorithm). A second approach, introduced here, decomposes the operation of a binary stochastic neuron into a stochastic binary part and a smooth differentiable part, which approximates the expected effect of the pure stochatic binary neuron to first order. A third approach involves the injection of additive or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Markov Chains and Monte Carlo Methods
MethodsREINFORCE
