Optimally-Weighted Herding is Bayesian Quadrature
Ferenc Huszar, David Duvenaud

TL;DR
This paper reveals that kernel herding minimizes the Bayesian quadrature posterior variance, and introduces an optimally-weighted herding method that outperforms existing approaches with faster convergence.
Contribution
It establishes a theoretical link between kernel herding and Bayesian quadrature, proposing an optimally-weighted herding method with superior performance.
Findings
Sequential Bayesian quadrature outperforms unweighted herding.
Achieves convergence rate faster than O(1/N).
Provides an upper bound on empirical Bayesian quadrature error.
Abstract
Herding and kernel herding are deterministic methods of choosing samples which summarise a probability distribution. A related task is choosing samples for estimating integrals using Bayesian quadrature. We show that the criterion minimised when selecting samples in kernel herding is equivalent to the posterior variance in Bayesian quadrature. We then show that sequential Bayesian quadrature can be viewed as a weighted version of kernel herding which achieves performance superior to any other weighted herding method. We demonstrate empirically a rate of convergence faster than O(1/N). Our results also imply an upper bound on the empirical error of the Bayesian quadrature estimate.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Model Reduction and Neural Networks · Water Systems and Optimization
