On the Regularizing Property of Stochastic Gradient Descent

Bangti Jin; Xiliang Lu

arXiv:1805.10470·math.NA·December 5, 2018

On the Regularizing Property of Stochastic Gradient Descent

Bangti Jin, Xiliang Lu

PDF

TL;DR

This paper rigorously proves the regularizing property of stochastic gradient descent with early stopping for large-scale inverse problems, combining regularization theory and stochastic analysis, supported by numerical experiments.

Contribution

It establishes the regularizing property of SGD with early stopping under a priori rules and provides convergence rates for linear inverse problems.

Findings

01

Proves regularizing property of SGD with early stopping.

02

Provides convergence rates under sourcewise condition.

03

Includes numerical experiments illustrating theoretical results.

Abstract

Stochastic gradient descent is one of the most successful approaches for solving large-scale problems, especially in machine learning and statistics. At each iteration, it employs an unbiased estimator of the full gradient computed from one single randomly selected data point. Hence, it scales well with problem size and is very attractive for truly massive dataset, and holds significant potentials for solving large-scale inverse problems. In the recent literature of machine learning, it was empirically observed that when equipped with early stopping, it has regularizing property. In this work, we rigorously establish its regularizing property (under \textit{a priori} early stopping rule), and also prove convergence rates under the canonical sourcewise condition, for minimizing the quadratic functional for linear inverse problems. This is achieved by combining tools from classical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.