Expectation Propagation performs a smoothed gradient descent
Guillaume P. Dehaene

TL;DR
This paper provides an intuitive understanding of Expectation Propagation by linking it to smoothed gradient descent on a convoluted energy landscape, clarifying its relation to other approximation methods.
Contribution
It rigorously relates EP to gradient descent on a smoothed energy landscape and connects it to Gaussian approximations minimizing reverse KL divergence.
Findings
EP is equivalent to gradient descent on a smoothed energy landscape
EP relates to Gaussian approximations minimizing reverse KL divergence
Provides intuitive insights into the workings of EP
Abstract
Bayesian inference is a popular method to build learning algorithms but it is hampered by the fact that its key object, the posterior probability distribution, is often uncomputable. Expectation Propagation (EP) (Minka (2001)) is a popular algorithm that solves this issue by computing a parametric approximation (e.g: Gaussian) to the density of the posterior. However, while it is known empirically to quickly compute fine approximations, EP is extremely poorly understood which prevents it from being adopted by a larger fraction of the community. The object of the present article is to shed intuitive light on EP, by relating it to other better understood methods. More precisely, we link it to using gradient descent to compute the Laplace approximation of a target probability distribution. We show that EP is exactly equivalent to performing gradient descent on a smoothed energy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Bayesian Methods and Mixture Models · Neural Networks and Applications
