Learning Curves of Stochastic Gradient Descent in Kernel Regression
Haihan Zhang, Weicheng Lin, Yuanshi Liu, Cong Fang

TL;DR
This paper analyzes the performance of stochastic gradient descent in kernel regression, revealing its optimal convergence rates and advantages over other methods, especially under model misspecification and specific step size schedules.
Contribution
It provides the first theoretical analysis showing SGD's optimal rates and advantages over iterative averaging in kernel regression with misspecified models.
Findings
SGD achieves min-max optimal rates across sample sizes.
SGD overcomes saturation phenomena common in ridge regression.
Exponential decay step size is key to SGD's success.
Abstract
This paper considers a canonical problem in kernel regression: how good are the model performances when it is trained by the popular online first-order algorithms, compared to the offline ones, such as ridge and ridgeless regression? In this paper, we analyze the foundational single-pass Stochastic Gradient Descent (SGD) in kernel regression under source condition where the optimal predictor can even not belong to the RKHS, i.e. the model is misspecified. Specifically, we focus on the inner product kernel over the sphere and characterize the exact orders of the excess risk curves under different scales of sample sizes concerning the input dimension . Surprisingly, we show that SGD achieves min-max optimal rates up to constants among all the scales, without suffering the saturation, a prevalent phenomenon observed in (ridge) regression, except when the model is highly misspecified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Adversarial Robustness in Machine Learning
