GRAWA: Gradient-based Weighted Averaging for Distributed Training of   Deep Learning Models

Tolga Dimlioglu; Anna Choromanska

arXiv:2403.04206·cs.LG·March 8, 2024·1 cites

GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning Models

Tolga Dimlioglu, Anna Choromanska

PDF

Open Access 1 Repo

TL;DR

This paper introduces GRAWA, a gradient-based weighted averaging algorithm for distributed deep learning that improves convergence speed and model quality by prioritizing flat regions in the optimization landscape.

Contribution

It proposes a novel weighted averaging method with theoretical convergence guarantees and demonstrates superior empirical performance over existing methods.

Findings

01

Faster convergence compared to baseline methods

02

Achieves better quality and flatter local optima

03

Requires less communication and fewer updates

Abstract

We study distributed training of deep learning models in time-constrained environments. We propose a new algorithm that periodically pulls workers towards the center variable computed as a weighted average of workers, where the weights are inversely proportional to the gradient norms of the workers such that recovering the flat regions in the optimization landscape is prioritized. We develop two asynchronous variants of the proposed algorithm that we call Model-level and Layer-level Gradient-based Weighted Averaging (resp. MGRAWA and LGRAWA), which differ in terms of the weighting scheme that is either done with respect to the entire model or is applied layer-wise. On the theoretical front, we prove the convergence guarantee for the proposed approach in both convex and non-convex settings. We then experimentally demonstrate that our algorithms outperform the competitor methods by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tolgadimli/grawa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Medical Imaging and Analysis · Brain Tumor Detection and Classification