On the Benefits of Large Learning Rates for Kernel Methods
Gaspard Beugnot, Julien Mairal, Alessandro Rudi

TL;DR
This paper demonstrates that large learning rates in kernel methods can improve generalization, especially with early stopping, by influencing spectral properties of the solution, extending insights from deep learning to kernel ridge regression.
Contribution
It provides a theoretical analysis of how large learning rates affect kernel methods, revealing their benefits for generalization and spectral properties of solutions.
Findings
Large learning rates can enhance generalization in kernel methods.
Early stopping interacts with learning rate to influence spectral decomposition.
The phenomenon applies to both regression and classification tasks without distribution mismatch.
Abstract
This paper studies an intriguing phenomenon related to the good generalization performance of estimators obtained by using large learning rates within gradient descent algorithms. First observed in the deep learning literature, we show that a phenomenon can be precisely characterized in the context of kernel methods, even though the resulting optimization problem is convex. Specifically, we consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution on the Hessian's eigenvectors. This extends an intuition described by Nakkiran (2020) on a two-dimensional toy problem to realistic learning scenarios such as kernel ridge regression. While large learning rates may be proven beneficial as soon as there is a mismatch between the train and test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Statistical Methods and Inference
