Non-parametric Stochastic Approximation with Large Step sizes
Aymeric Dieuleveut, Francis Bach

TL;DR
This paper demonstrates that using large step sizes in a stochastic gradient approach for kernel-based regression achieves optimal convergence rates across different smoothness regimes, even when the true predictor isn't in the RKHS.
Contribution
It introduces a non-parametric stochastic approximation method with large step sizes that attains optimal convergence rates in RKHS regression.
Findings
Large step sizes improve convergence speed.
Optimal rates achieved for various smoothness conditions.
Method works even if the true predictor is outside the RKHS.
Abstract
We consider the random-design least-squares regression problem within the reproducing kernel Hilbert space (RKHS) framework. Given a stream of independent and identically distributed input/output data, we aim to learn a regression function within an RKHS , even if the optimal predictor (i.e., the conditional expectation) is not in . In a stochastic approximation framework where the estimator is updated after each observation, we show that the averaged unregularized least-mean-square algorithm (a form of stochastic gradient), given a sufficient large step-size, attains optimal rates of convergence for a variety of regimes for the smoothnesses of the optimal prediction function and the functions in .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
