Richer priors for infinitely wide multi-layer perceptrons
Russell Tsuchida, Fred Roosta, Marcus Gallagher

TL;DR
This paper extends the Gaussian process limit of infinitely wide MLPs to more general priors, including non-zero mean and partially exchangeable priors, and empirically demonstrates improved kernel properties and convergence behaviors.
Contribution
It introduces a broader class of priors for deep MLPs, derives their limiting kernels, and empirically shows these priors avoid pathologies of previous models.
Findings
Derived kernels for deep MLPs with richer priors.
Empirical evidence of convergence to the limiting Gaussian process.
Improved kernel properties compared to previous priors.
Abstract
It is well-known that the distribution over functions induced through a zero-mean iid prior distribution over the parameters of a multi-layer perceptron (MLP) converges to a Gaussian process (GP), under mild conditions. We extend this result firstly to independent priors with general zero or non-zero means, and secondly to a family of partially exchangeable priors which generalise iid priors. We discuss how the second prior arises naturally when considering an equivalence class of functions in an MLP and through training processes such as stochastic gradient descent. The model resulting from partially exchangeable priors is a GP, with an additional level of inference in the sense that the prior and posterior predictive distributions require marginalisation over hyperparameters. We derive the kernels of the limiting GP in deep MLPs, and show empirically that these kernels avoid certain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Blind Source Separation Techniques · Image and Signal Denoising Methods
MethodsGaussian Process
