Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and   Optimality

Ziyang Wei; Wanrong Zhu; Wei Biao Wu

arXiv:2307.06915·stat.ML·April 8, 2025·1 cites

Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality

Ziyang Wei, Wanrong Zhu, Wei Biao Wu

PDF

Open Access

TL;DR

This paper studies a general weighted averaging scheme for stochastic gradient descent, establishing its asymptotic normality, proposing an adaptive method with optimal statistical properties, and enabling valid online inference.

Contribution

It introduces a broad class of weighted averaging schemes for SGD, proves their asymptotic normality, and develops an adaptive averaging method with optimal convergence and inference capabilities.

Findings

01

Asymptotic normality of weighted averaged SGD solutions.

02

An adaptive averaging scheme with optimal statistical rate.

03

Insights into optimal weights for linear models in non-asymptotic MSE.

Abstract

Stochastic Gradient Descent (SGD) is one of the most popular algorithms in statistical and machine learning due to its computational and memory efficiency. Various averaging schemes have been proposed to accelerate the convergence of SGD in different settings. In this paper, we explore a general averaging scheme for SGD. Specifically, we establish the asymptotic normality of a broad range of weighted averaged SGD solutions and provide asymptotically valid online inference approaches. Furthermore, we propose an adaptive averaging scheme that exhibits both optimal statistical rate and favorable non-asymptotic convergence, drawing insights from the optimal weight for the linear model in terms of non-asymptotic mean squared error (MSE).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Statistical Methods and Inference

MethodsStochastic Gradient Descent