Effectiveness of Distributed Gradient Descent with Local Steps for Overparameterized Models
Heng Zhu, Harsh Vardhan, Arya Mazumdar

TL;DR
This paper analyzes the implicit bias of distributed Local Gradient Descent in overparameterized models, showing it converges to the centralized solution in direction and providing convergence rates, explaining its effectiveness with many local steps.
Contribution
It provides the first detailed analysis of the implicit bias of Local-GD in the interpolation regime, demonstrating convergence to the centralized model regardless of local steps count.
Findings
Local-GD converges to the centralized model in direction.
The convergence rate depends on the number of local steps.
Modified Local-GD achieves learning rate independence from local steps.
Abstract
In distributed training of machine learning models, gradient descent with local iterative steps, commonly known as Local (Stochastic) Gradient Descent (Local-(S)GD) or Federated averaging (FedAvg), is a very popular method to mitigate communication burden. In this method, gradient steps based on local datasets are taken independently in distributed compute nodes to update the local models, which are then aggregated intermittently. In the interpolation regime, Local-GD can converge to zero training loss. However, with many potential solutions corresponding to zero training loss, it is not known which solution Local-GD converges to. In this work we answer this question by analyzing implicit bias of Local-GD for classification tasks with linearly separable data. For the interpolation regime, our analysis shows that the aggregated global model obtained from Local-GD, with arbitrary number…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Computer Graphics and Visualization Techniques · Medical Image Segmentation Techniques
MethodsSparse Evolutionary Training
