On the Benefits of Large Learning Rates for Kernel Methods

Gaspard Beugnot; Julien Mairal; Alessandro Rudi

arXiv:2202.13733·stat.ML·June 6, 2022

On the Benefits of Large Learning Rates for Kernel Methods

Gaspard Beugnot, Julien Mairal, Alessandro Rudi

PDF

Open Access

TL;DR

This paper demonstrates that large learning rates in kernel methods can improve generalization, especially with early stopping, by influencing spectral properties of the solution, extending insights from deep learning to kernel ridge regression.

Contribution

It provides a theoretical analysis of how large learning rates affect kernel methods, revealing their benefits for generalization and spectral properties of solutions.

Findings

01

Large learning rates can enhance generalization in kernel methods.

02

Early stopping interacts with learning rate to influence spectral decomposition.

03

The phenomenon applies to both regression and classification tasks without distribution mismatch.

Abstract

This paper studies an intriguing phenomenon related to the good generalization performance of estimators obtained by using large learning rates within gradient descent algorithms. First observed in the deep learning literature, we show that a phenomenon can be precisely characterized in the context of kernel methods, even though the resulting optimization problem is convex. Specifically, we consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution on the Hessian's eigenvectors. This extends an intuition described by Nakkiran (2020) on a two-dimensional toy problem to realistic learning scenarios such as kernel ridge regression. While large learning rates may be proven beneficial as soon as there is a mismatch between the train and test…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Statistical Methods and Inference