Minibatch vs Local SGD for Heterogeneous Distributed Learning
Blake Woodworth, Kumar Kshitij Patel, Nathan Srebro

TL;DR
This paper compares Minibatch and Local SGD algorithms in heterogeneous distributed learning, showing Minibatch's dominance and establishing conditions where accelerated Minibatch is optimal, with new bounds for Local SGD.
Contribution
It demonstrates Minibatch SGD's superiority in heterogeneous settings and provides the first improved bounds for Local SGD in non-homogeneous regimes.
Findings
Minibatch SGD outperforms Local SGD in heterogeneous settings
Accelerated Minibatch SGD is optimal under high heterogeneity
New upper bounds for Local SGD surpass Minibatch in certain regimes
Abstract
We analyze Local SGD (aka parallel or federated SGD) and Minibatch SGD in the heterogeneous distributed setting, where each machine has access to stochastic gradient estimates for a different, machine-specific, convex objective; the goal is to optimize w.r.t. the average objective; and machines can only communicate intermittently. We argue that, (i) Minibatch SGD (even without acceleration) dominates all existing analysis of Local SGD in this setting, (ii) accelerated Minibatch SGD is optimal when the heterogeneity is high, and (iii) present the first upper bound for Local SGD that improves over Minibatch SGD in a non-homogeneous regime.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques
MethodsLocal SGD · Stochastic Gradient Descent
