A Gaussian Process View on Observation Noise and Initialization in Wide Neural Networks
Sergio Calvo-Ordo\~nez, Jonathan Plenk, Richard Bergna, Alvaro Cartea, Jose Miguel Hernandez-Lobato, Konstantina Palla, Kamil Ciosek

TL;DR
This paper introduces methods to incorporate observation noise and arbitrary prior means into the Gaussian Process view of wide neural networks, enhancing their practical applicability.
Contribution
It proposes a regularizer for noisy data and a shifted network architecture for arbitrary prior means, extending NTK-GP equivalence to more realistic scenarios.
Findings
Regularizer effectively models observation noise in NTK-GP.
Shifted network enables arbitrary prior means without kernel inversion.
Experimental results demonstrate improved practical applicability.
Abstract
Performing gradient descent in a wide neural network is equivalent to computing the posterior mean of a Gaussian Process with the Neural Tangent Kernel (NTK-GP), for a specific prior mean and with zero observation noise. However, existing formulations have two limitations: (i) the NTK-GP assumes noiseless targets, leading to misspecification on noisy data; (ii) the equivalence does not extend to arbitrary prior means, which are essential for well-specified models. To address (i), we introduce a regularizer into the training objective, showing its correspondence to incorporating observation noise in the NTK-GP. To address (ii), we propose a \textit{shifted network} that enables arbitrary prior means and allows obtaining the posterior mean with gradient descent on a single network, without ensembling or kernel inversion. We validate our results with experiments across datasets and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
