Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent
Xiaoge Deng, Li Shen, Shengwei Li, Tao Sun, Dongsheng Li, and Dacheng Tao

TL;DR
This paper provides new theoretical bounds on the generalization error of asynchronous delayed stochastic gradient descent, showing that delays can actually reduce generalization error, supported by experiments.
Contribution
It introduces sharper generalization error bounds for delayed SGD using generating function analysis, revealing delays can improve generalization.
Findings
Asynchronous delays can reduce the generalization error of SGD.
New bounds are established for quadratic convex and strongly convex problems.
Experimental results support the theoretical findings.
Abstract
Stochastic gradient descent (SGD) performed in an asynchronous manner plays a crucial role in training large-scale machine learning models. However, the generalization performance of asynchronous delayed SGD, which is an essential metric for assessing machine learning algorithms, has rarely been explored. Existing generalization error bounds are rather pessimistic and cannot reveal the correlation between asynchronous delays and generalization. In this paper, we investigate sharper generalization error bound for SGD with asynchronous delay . Leveraging the generating function analysis tool, we first establish the average stability of the delayed gradient algorithm. Based on this algorithmic stability, we provide upper bounds on the generalization error of and for quadratic convex and strongly convex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Sparse and Compressive Sensing Techniques
MethodsStochastic Gradient Descent
