The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables
Chris J. Maddison, Andriy Mnih, Yee Whye Teh

TL;DR
This paper introduces the Concrete distribution, a continuous relaxation of discrete variables, enabling low-variance gradient estimation in stochastic computation graphs for improved density estimation and structured prediction.
Contribution
The paper presents the Concrete distribution with closed-form densities and a simple reparameterization, facilitating gradient-based optimization of discrete variables.
Findings
Effective in density estimation tasks
Improves structured prediction performance
Enables low-variance biased gradient estimates
Abstract
The reparameterization trick enables optimizing large scale stochastic computation graphs via gradient descent. The essence of the trick is to refactor each stochastic node into a differentiable function of its parameters and a random variable with fixed distribution. After refactoring, the gradients of the loss propagated by the chain rule through the graph are low variance unbiased estimators of the gradients of the expected loss. While many continuous random variables have such reparameterizations, discrete random variables lack useful reparameterizations due to the discontinuous nature of discrete states. In this work we introduce Concrete random variables---continuous relaxations of discrete random variables. The Concrete distribution is a new family of distributions with closed form densities and a simple reparameterization. Whenever a discrete stochastic node of a computation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeotechnical Engineering and Analysis
