Local SGD Accelerates Convergence by Exploiting Second Order Information   of the Loss Function

Linxuan Pan; Shenghui Song

arXiv:2305.15013·cs.LG·May 29, 2023·1 cites

Local SGD Accelerates Convergence by Exploiting Second Order Information of the Loss Function

Linxuan Pan, Shenghui Song

PDF

Open Access

TL;DR

This paper demonstrates that local SGD accelerates convergence by leveraging second order information of the loss function, explaining its effectiveness in distributed learning and its potential to approach Newton's method.

Contribution

The paper provides a theoretical analysis showing how local SGD exploits second order information, which was previously not well understood.

Findings

01

L-SGD explores second order information of the loss function.

02

L-SGD has larger projections on eigenvectors with small eigenvalues.

03

L-SGD can approach the Newton method under certain conditions.

Abstract

With multiple iterations of updates, local statistical gradient descent (L-SGD) has been proven to be very effective in distributed machine learning schemes such as federated learning. In fact, many innovative works have shown that L-SGD with independent and identically distributed (IID) data can even outperform SGD. As a result, extensive efforts have been made to unveil the power of L-SGD. However, existing analysis failed to explain why the multiple local updates with small mini-batches of data (L-SGD) can not be replaced by the update with one big batch of data and a larger learning rate (SGD). In this paper, we offer a new perspective to understand the strength of L-SGD. We theoretically prove that, with IID data, L-SGD can effectively explore the second order information of the loss function. In particular, compared with SGD, the updates of L-SGD have much larger projection on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Machine Learning and ELM

MethodsStochastic Gradient Descent