Optimally-Weighted Herding is Bayesian Quadrature
Ferenc Husz\'ar, David Duvenaud

TL;DR
This paper reveals that kernel herding minimizes the Bayesian quadrature posterior variance, and introduces an optimally-weighted herding method that outperforms existing approaches with faster convergence.
Contribution
It establishes the equivalence between kernel herding and Bayesian quadrature variance minimization and proposes an optimally-weighted herding method with superior performance.
Findings
Sequential Bayesian quadrature outperforms traditional herding methods.
Empirical convergence rate faster than O(1/N).
Provides an upper bound on Bayesian quadrature error.
Abstract
Herding and kernel herding are deterministic methods of choosing samples which summarise a probability distribution. A related task is choosing samples for estimating integrals using Bayesian quadrature. We show that the criterion minimised when selecting samples in kernel herding is equivalent to the posterior variance in Bayesian quadrature. We then show that sequential Bayesian quadrature can be viewed as a weighted version of kernel herding which achieves performance superior to any other weighted herding method. We demonstrate empirically a rate of convergence faster than O(1/N). Our results also imply an upper bound on the empirical error of the Bayesian quadrature estimate.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Model Reduction and Neural Networks · Control Systems and Identification
