Finite Versus Infinite Neural Networks: an Empirical Study
Jaehoon Lee, Samuel S. Schoenholz, Jeffrey Pennington, Ben Adlam,, Lechao Xiao, Roman Novak, Jascha Sohl-Dickstein

TL;DR
This study empirically compares finite neural networks with their infinite-width kernel counterparts, revealing key differences, effects of various regularizations, and proposing best practices for kernel-based predictions, achieving state-of-the-art results.
Contribution
It provides a comprehensive empirical analysis of the correspondence between finite neural networks and infinite kernel methods, introducing improved practices and insights into their differences.
Findings
Kernel methods outperform finite fully-connected networks.
Convolutional networks outperform fully-connected networks.
NNGP kernels often outperform NT kernels.
Abstract
We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods. By doing so, we resolve a variety of open questions related to the study of infinitely wide neural networks. Our experimental results include: kernel methods outperform fully-connected finite-width networks, but underperform convolutional finite width networks; neural network Gaussian process (NNGP) kernels frequently outperform neural tangent (NT) kernels; centered and ensembled finite networks have reduced posterior variance and behave more similarly to infinite networks; weight decay and the use of a large learning rate break the correspondence between finite and infinite networks; the NTK parameterization outperforms the standard parameterization for finite width networks; diagonal regularization of kernels acts similarly to early stopping; floating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Gaussian Processes and Bayesian Inference · Machine Learning and Data Classification
MethodsNeural Tangent Kernel · Gaussian Process · ZCA Whitening · Weight Decay
