Optimized convergence of stochastic gradient descent by weighted   averaging

Melinda Hagedorn; Florian Jarre

arXiv:2209.14092·math.OC·October 6, 2022·Optim. Methods Softw.

Optimized convergence of stochastic gradient descent by weighted averaging

Melinda Hagedorn, Florian Jarre

PDF

Open Access

TL;DR

This paper investigates weighted averaging strategies in stochastic gradient descent to improve convergence rates, especially in finite iterations, by balancing stochastic and optimization errors.

Contribution

It derives explicit formulas for errors and proposes parameter choices that reduce optimization error over standard averaging in stochastic gradient methods.

Findings

01

Weighted averaging can significantly reduce optimization error.

02

Parameter tuning balances stochastic and optimization errors effectively.

03

Numerical results support theoretical improvements and potential generalizations.

Abstract

Under mild assumptions stochastic gradient methods asymptotically achieve an optimal rate of convergence if the arithmetic mean of all iterates is returned as an approximate optimal solution. However, in the absence of stochastic noise, the arithmetic mean of all iterates converges considerably slower to the optimal solution than the iterates themselves. And also in the presence of noise, when a finite termination of the stochastic gradient method is considered, the arithmetic mean is not necessarily the best possible approximation to the unknown optimal solution. This paper aims at identifying optimal strategies in a particularly simple case, the minimization of a strongly convex function with i. i. d. noise terms and finite termination. Explicit formulas for the stochastic error and the optimization error are derived in dependence of certain parameters of the SGD method. The aim was…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Numerical methods in inverse problems