SGD and Hogwild! Convergence Without the Bounded Gradients Assumption
Lam M. Nguyen, Phuong Ha Nguyen, Marten van Dijk, Peter Richt\'arik,, Katya Scheinberg, Martin Tak\'a\v{c}

TL;DR
This paper presents a new convergence analysis for stochastic gradient descent (SGD) and Hogwild! algorithms that relax the traditional bounded gradient assumption, demonstrating their effectiveness in machine learning tasks with diminishing learning rates.
Contribution
It introduces a convergence analysis for SGD under relaxed conditions and extends the results to asynchronous parallel Hogwild! with diminishing learning rates.
Findings
SGD convergence under true gradient norm bounds
Hogwild! convergence in asynchronous setting with diminishing learning rates
More relaxed conditions than previous analyses
Abstract
Stochastic gradient descent (SGD) is the optimization algorithm of choice in many machine learning applications such as regularized empirical risk minimization and training deep neural networks. The classical convergence analysis of SGD is carried out under the assumption that the norm of the stochastic gradient is uniformly bounded. While this might hold for some loss functions, it is always violated for cases where the objective function is strongly convex. In (Bottou et al.,2016), a new analysis of convergence of SGD is performed under the assumption that stochastic gradients are bounded with respect to the true gradient norm. Here we show that for stochastic problems arising in machine learning such bound always holds; and we also propose an alternative convergence analysis of SGD with diminishing learning rate regime, which results in more relaxed conditions than those in (Bottou…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic processes and financial applications · Economic theories and models · Optimization and Variational Analysis
MethodsStochastic Gradient Descent
