On the Asymptotic Learning Curves of Kernel Ridge Regression under Power-law Decay
Yicheng Li, Haobo Zhang, Qian Lin

TL;DR
This paper rigorously characterizes the learning curves of kernel ridge regression under power-law decay, revealing conditions under which benign overfitting occurs in wide neural networks with low noise.
Contribution
It provides a full theoretical analysis of the learning curve for kernel ridge regression under realistic assumptions, clarifying the effects of regularization, source condition, and noise.
Findings
Benign overfitting occurs only when noise is small in very wide neural networks.
The choice of regularization parameter critically influences the learning curve.
The source condition and noise level jointly determine the generalization behavior.
Abstract
The widely observed 'benign overfitting phenomenon' in the neural network literature raises the challenge to the 'bias-variance trade-off' doctrine in the statistical learning theory. Since the generalization ability of the 'lazy trained' over-parametrized neural network can be well approximated by that of the neural tangent kernel regression, the curve of the excess risk (namely, the learning curve) of kernel ridge regression attracts increasing attention recently. However, most recent arguments on the learning curve are heuristic and are based on the 'Gaussian design' assumption. In this paper, under mild and more realistic assumptions, we rigorously provide a full characterization of the learning curve: elaborating the effect and the interplay of the choice of the regularization parameter, the source condition and the noise. In particular, our results suggest that the 'benign…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Gaussian Processes and Bayesian Inference · Statistical Mechanics and Entropy
