Benign Underfitting of Stochastic Gradient Descent

Tomer Koren; Roi Livni; Yishay Mansour; Uri Sherman

arXiv:2202.13361·cs.LG·January 13, 2023

Benign Underfitting of Stochastic Gradient Descent

Tomer Koren, Roi Livni, Yishay Mansour, Uri Sherman

PDF

Open Access 1 Video

TL;DR

This paper reveals that stochastic gradient descent (SGD) can produce solutions with poor generalization, challenging the conventional understanding of its effectiveness, especially in the without-replacement setting.

Contribution

It demonstrates that without-replacement SGD can have large generalization gaps, unlike with-replacement SGD, and provides new bounds for multi-epoch regimes in convex optimization.

Findings

01

SGD can have a large generalization gap of Ω(1).

02

With-replacement SGD converges at the optimal rate.

03

New bounds for multi-epoch regimes improve previous results.

Abstract

We study to what extent may stochastic gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit to training data. We consider the fundamental stochastic convex optimization framework, where (one pass, without-replacement) SGD is classically known to minimize the population risk at rate $O (1/ n)$ , and prove that, surprisingly, there exist problem instances where the SGD solution exhibits both empirical risk and generalization gap of $Ω (1)$ . Consequently, it turns out that SGD is not algorithmically stable in any sense, and its generalization ability cannot be explained by uniform convergence or any other currently known generalization bound technique for that matter (other than that of its classical analysis). We then continue to analyze the closely related with-replacement SGD, for which we show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Benign Underfitting of Stochastic Gradient Descent· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods

MethodsStochastic Gradient Descent