SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and   Interpolation

Robert M. Gower; Othmane Sebbouh; Nicolas Loizou

arXiv:2006.10311·math.OC·March 23, 2021·6 cites

SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation

Robert M. Gower, Othmane Sebbouh, Nicolas Loizou

PDF

Open Access

TL;DR

This paper establishes new convergence guarantees for SGD on structured non-convex functions, including quasar convex and Polyak-Lojasiewicz functions, with insights into minibatching and interpolation scenarios.

Contribution

It introduces weaker residual conditions for convergence analysis and provides optimal minibatch size insights for structured non-convex functions.

Findings

01

SGD converges to a global minimum under structural assumptions.

02

Expected Residual condition is weaker than previous assumptions.

03

Optimal minibatch size is characterized for efficient training.

Abstract

Stochastic Gradient Descent (SGD) is being used routinely for optimizing non-convex functions. Yet, the standard convergence theory for SGD in the smooth non-convex setting gives a slow sublinear convergence to a stationary point. In this work, we provide several convergence theorems for SGD showing convergence to a global minimum for non-convex problems satisfying some extra structural assumptions. In particular, we focus on two large classes of structured non-convex functions: (i) Quasar (Strongly) Convex functions (a generalization of convex functions) and (ii) functions satisfying the Polyak-Lojasiewicz condition (a generalization of strongly-convex functions). Our analysis relies on an Expected Residual condition which we show is a strictly weaker assumption than previously used growth conditions, expected smoothness or bounded variance assumptions. We provide theoretical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Sparse and Compressive Sensing Techniques

MethodsStochastic Gradient Descent