TL;DR
This paper develops an analytical theory for understanding how kernel regression and infinitely wide neural networks generalize, highlighting the role of task-model alignment and spectral properties in predicting performance.
Contribution
It introduces a new analytical expression for generalization error applicable to various kernels and data distributions, connecting spectral properties to generalization in neural networks.
Findings
Kernel eigenfunctions reveal data simplicity and task compatibility.
More data can impair generalization with noisy or incompatible kernels.
Rotation invariant kernels exhibit non-monotonic learning curves in high dimensions.
Abstract
Generalization beyond a training dataset is a main goal of machine learning, but theoretical understanding of generalization remains an open problem for many models. The need for a new theory is exacerbated by recent observations in deep neural networks where overparameterization leads to better performance, contradicting the conventional wisdom from classical statistics. In this paper, we investigate generalization error for kernel regression, which, besides being a popular machine learning method, also includes infinitely overparameterized neural networks trained with gradient descent. We use techniques from statistical mechanics to derive an analytical expression for generalization error applicable to any kernel or data distribution. We present applications of our theory to real and synthetic datasets, and for many kernels including those that arise from training deep neural networks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Regression
