Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime
Benjamin Bowman, Guido Montufar

TL;DR
This paper quantifies how deep neural networks trained with finite width and data tend to learn the top eigenfunctions of the Neural Tangent Kernel across the entire input space, revealing an inherent spectral bias.
Contribution
It provides bounds on the difference between finite and infinite width network dynamics, highlighting a bias towards top eigenfunctions independent of the target function.
Findings
Bias depends only on architecture and input distribution
Width does not need to grow polynomially with data
Results apply to various deep architectures
Abstract
We provide quantitative bounds measuring the difference in function space between the trajectory of a finite-width network trained on finitely many samples from the idealized kernel dynamics of infinite width and infinite data. An implication of the bounds is that the network is biased to learn the top eigenfunctions of the Neural Tangent Kernel not just on the training set but over the entire input space. This bias depends on the model architecture and input distribution alone and thus does not depend on the target function which does not need to be in the RKHS of the kernel. The result is valid for deep architectures with fully connected, convolutional, and residual layers. Furthermore the width does not need to grow polynomially with the number of samples in order to obtain high probability bounds up to a stopping time. The proof exploits the low-effective-rank property of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Machine Learning and ELM
