Minibatch and Local SGD: Algorithmic Stability and Linear Speedup in Generalization

Yunwen Lei; Tao Sun; Mingrui Liu

arXiv:2310.01139·cs.LG·October 14, 2025·2 cites

Minibatch and Local SGD: Algorithmic Stability and Linear Speedup in Generalization

Yunwen Lei, Tao Sun, Mingrui Liu

PDF

Open Access

TL;DR

This paper analyzes the stability and generalization of minibatch and local SGD, demonstrating their linear speedup in achieving optimal risk bounds and highlighting the role of training errors in generalization for overparameterized models.

Contribution

It provides the first stability and generalization analysis of minibatch and local SGD, establishing their linear speedup in risk bounds and incorporating training errors into the analysis.

Findings

01

Minibatch and local SGD achieve linear speedup in risk bounds.

02

Small training errors enhance generalization in overparameterized models.

03

Theoretical framework connects stability, training errors, and generalization.

Abstract

The increasing scale of data propels the popularity of leveraging parallelism to speed up the optimization. Minibatch stochastic gradient descent (minibatch SGD) and local SGD are two popular methods for parallel optimization. The existing theoretical studies show a linear speedup of these methods with respect to the number of machines, which, however, is measured by optimization errors in a multi-pass setting. As a comparison, the stability and generalization of these methods are much less studied. In this paper, we study the stability and generalization analysis of minibatch and local SGD to understand their learnability by introducing an expectation-variance decomposition. We incorporate training errors into the stability analysis, which shows how small training errors help generalization for overparameterized models. We show minibatch and local SGD achieve a linear speedup to attain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and Data Classification · Machine Learning and Algorithms

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Stochastic Gradient Descent · Local SGD