SGD and Hogwild! Convergence Without the Bounded Gradients Assumption

Lam M. Nguyen; Phuong Ha Nguyen; Marten van Dijk; Peter Richt\'arik,; Katya Scheinberg; Martin Tak\'a\v{c}

arXiv:1802.03801·math.OC·July 10, 2018·39 cites

SGD and Hogwild! Convergence Without the Bounded Gradients Assumption

Lam M. Nguyen, Phuong Ha Nguyen, Marten van Dijk, Peter Richt\'arik,, Katya Scheinberg, Martin Tak\'a\v{c}

PDF

Open Access

TL;DR

This paper presents a new convergence analysis for stochastic gradient descent (SGD) and Hogwild! algorithms that relax the traditional bounded gradient assumption, demonstrating their effectiveness in machine learning tasks with diminishing learning rates.

Contribution

It introduces a convergence analysis for SGD under relaxed conditions and extends the results to asynchronous parallel Hogwild! with diminishing learning rates.

Findings

01

SGD convergence under true gradient norm bounds

02

Hogwild! convergence in asynchronous setting with diminishing learning rates

03

More relaxed conditions than previous analyses

Abstract

Stochastic gradient descent (SGD) is the optimization algorithm of choice in many machine learning applications such as regularized empirical risk minimization and training deep neural networks. The classical convergence analysis of SGD is carried out under the assumption that the norm of the stochastic gradient is uniformly bounded. While this might hold for some loss functions, it is always violated for cases where the objective function is strongly convex. In (Bottou et al.,2016), a new analysis of convergence of SGD is performed under the assumption that stochastic gradients are bounded with respect to the true gradient norm. Here we show that for stochastic problems arising in machine learning such bound always holds; and we also propose an alternative convergence analysis of SGD with diminishing learning rate regime, which results in more relaxed conditions than those in (Bottou…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic processes and financial applications · Economic theories and models · Optimization and Variational Analysis

MethodsStochastic Gradient Descent