Estimating or Propagating Gradients Through Stochastic Neurons for   Conditional Computation

Yoshua Bengio; Nicholas L\'eonard; Aaron Courville

arXiv:1308.3432·cs.LG·August 16, 2013·2.0k cites

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

Yoshua Bengio, Nicholas L\'eonard, Aaron Courville

PDF

Open Access 2 Repos

TL;DR

This paper investigates methods for estimating gradients through stochastic neurons, comparing four approaches, and demonstrates their application in conditional computation to enable sparse, efficient neural networks.

Contribution

It introduces a new gradient estimator based on decomposing stochastic binary neurons and compares it with existing methods in the context of conditional computation.

Findings

01

The proposed decomposed estimator approximates the expected effect of stochastic neurons.

02

Straight-through estimator provides a heuristic but effective gradient approximation.

03

Conditional stochastic gating can significantly reduce computational costs in deep networks.

Abstract

Stochastic neurons and hard non-linearities can be useful for a number of reasons in deep learning models, but in many cases they pose a challenging problem: how to estimate the gradient of a loss function with respect to the input of such stochastic or non-smooth neurons? I.e., can we "back-propagate" through these stochastic neurons? We examine this question, existing approaches, and compare four families of solutions, applicable in different settings. One of them is the minimum variance unbiased gradient estimator for stochatic binary neurons (a special case of the REINFORCE algorithm). A second approach, introduced here, decomposes the operation of a binary stochastic neuron into a stochastic binary part and a smooth differentiable part, which approximates the expected effect of the pure stochatic binary neuron to first order. A third approach involves the injection of additive or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Markov Chains and Monte Carlo Methods

MethodsREINFORCE