Effectiveness of Distributed Gradient Descent with Local Steps for Overparameterized Models

Heng Zhu; Harsh Vardhan; Arya Mazumdar

arXiv:2412.07971·cs.LG·March 24, 2026

Effectiveness of Distributed Gradient Descent with Local Steps for Overparameterized Models

Heng Zhu, Harsh Vardhan, Arya Mazumdar

PDF

Open Access

TL;DR

This paper analyzes the implicit bias of distributed Local Gradient Descent in overparameterized models, showing it converges to the centralized solution in direction and providing convergence rates, explaining its effectiveness with many local steps.

Contribution

It provides the first detailed analysis of the implicit bias of Local-GD in the interpolation regime, demonstrating convergence to the centralized model regardless of local steps count.

Findings

01

Local-GD converges to the centralized model in direction.

02

The convergence rate depends on the number of local steps.

03

Modified Local-GD achieves learning rate independence from local steps.

Abstract

In distributed training of machine learning models, gradient descent with local iterative steps, commonly known as Local (Stochastic) Gradient Descent (Local-(S)GD) or Federated averaging (FedAvg), is a very popular method to mitigate communication burden. In this method, gradient steps based on local datasets are taken independently in distributed compute nodes to update the local models, which are then aggregated intermittently. In the interpolation regime, Local-GD can converge to zero training loss. However, with many potential solutions corresponding to zero training loss, it is not known which solution Local-GD converges to. In this work we answer this question by analyzing implicit bias of Local-GD for classification tasks with linearly separable data. For the interpolation regime, our analysis shows that the aggregated global model obtained from Local-GD, with arbitrary number…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Computer Graphics and Visualization Techniques · Medical Image Segmentation Techniques

MethodsSparse Evolutionary Training