Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent

Xiaoge Deng; Li Shen; Shengwei Li; Tao Sun; Dongsheng Li; and Dacheng Tao

arXiv:2308.09430·cs.LG·May 27, 2025

Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent

Xiaoge Deng, Li Shen, Shengwei Li, Tao Sun, Dongsheng Li, and Dacheng Tao

PDF

Open Access

TL;DR

This paper provides new theoretical bounds on the generalization error of asynchronous delayed stochastic gradient descent, showing that delays can actually reduce generalization error, supported by experiments.

Contribution

It introduces sharper generalization error bounds for delayed SGD using generating function analysis, revealing delays can improve generalization.

Findings

01

Asynchronous delays can reduce the generalization error of SGD.

02

New bounds are established for quadratic convex and strongly convex problems.

03

Experimental results support the theoretical findings.

Abstract

Stochastic gradient descent (SGD) performed in an asynchronous manner plays a crucial role in training large-scale machine learning models. However, the generalization performance of asynchronous delayed SGD, which is an essential metric for assessing machine learning algorithms, has rarely been explored. Existing generalization error bounds are rather pessimistic and cannot reveal the correlation between asynchronous delays and generalization. In this paper, we investigate sharper generalization error bound for SGD with asynchronous delay $τ$ . Leveraging the generating function analysis tool, we first establish the average stability of the delayed gradient algorithm. Based on this algorithmic stability, we provide upper bounds on the generalization error of $\tilde{O} (\frac{T - τ}{n τ})$ and $\tilde{O} (\frac{1}{n})$ for quadratic convex and strongly convex…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Sparse and Compressive Sensing Techniques

MethodsStochastic Gradient Descent