Is Local SGD Better than Minibatch SGD?

Blake Woodworth; Kumar Kshitij Patel; Sebastian U. Stich; Zhen Dai,; Brian Bullins; H. Brendan McMahan; Ohad Shamir; Nathan Srebro

arXiv:2002.07839·cs.LG·July 21, 2020·45 cites

Is Local SGD Better than Minibatch SGD?

Blake Woodworth, Kumar Kshitij Patel, Sebastian U. Stich, Zhen Dai,, Brian Bullins, H. Brendan McMahan, Ohad Shamir, Nathan Srebro

PDF

Open Access 1 Video

TL;DR

This paper analyzes the theoretical performance of local SGD compared to minibatch SGD, showing it can outperform in quadratic cases but not universally, and establishing new bounds for convex objectives.

Contribution

It provides the first theoretical guarantees for local SGD's performance in convex settings and clarifies when it can outperform minibatch SGD.

Findings

01

Local SGD outperforms minibatch SGD for quadratic objectives.

02

Accelerated local SGD is minimax optimal for quadratics.

03

Lower bounds show local SGD can perform worse than minibatch SGD.

Abstract

We study local SGD (also known as parallel SGD and federated averaging), a natural and frequently used stochastic distributed optimization method. Its theoretical foundations are currently lacking and we highlight how all existing error guarantees in the convex setting are dominated by a simple baseline, minibatch SGD. (1) For quadratic objectives we prove that local SGD strictly dominates minibatch SGD and that accelerated local SGD is minimax optimal for quadratics; (2) For general convex objectives we provide the first guarantee that at least sometimes improves over minibatch SGD; (3) We show that indeed local SGD does not dominate minibatch SGD by presenting a lower bound on the performance of local SGD that is worse than the minibatch SGD guarantee.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Is Local SGD Better than Minibatch SGD?· slideslive

Taxonomy

TopicsMRI in cancer diagnosis · Advanced Bandit Algorithms Research · Privacy-Preserving Technologies in Data

MethodsLocal SGD · Stochastic Gradient Descent