On the Regularizing Property of Stochastic Gradient Descent
Bangti Jin, Xiliang Lu

TL;DR
This paper rigorously proves the regularizing property of stochastic gradient descent with early stopping for large-scale inverse problems, combining regularization theory and stochastic analysis, supported by numerical experiments.
Contribution
It establishes the regularizing property of SGD with early stopping under a priori rules and provides convergence rates for linear inverse problems.
Findings
Proves regularizing property of SGD with early stopping.
Provides convergence rates under sourcewise condition.
Includes numerical experiments illustrating theoretical results.
Abstract
Stochastic gradient descent is one of the most successful approaches for solving large-scale problems, especially in machine learning and statistics. At each iteration, it employs an unbiased estimator of the full gradient computed from one single randomly selected data point. Hence, it scales well with problem size and is very attractive for truly massive dataset, and holds significant potentials for solving large-scale inverse problems. In the recent literature of machine learning, it was empirically observed that when equipped with early stopping, it has regularizing property. In this work, we rigorously establish its regularizing property (under \textit{a priori} early stopping rule), and also prove convergence rates under the canonical sourcewise condition, for minimizing the quadratic functional for linear inverse problems. This is achieved by combining tools from classical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
