Communication-Efficient Distributed Deep Learning via Federated Dynamic Averaging
Michail Theologitis, Georgios Frangias, Georgios Anestis, Vasilis, Samoladas, Antonios Deligiannakis

TL;DR
This paper introduces Federated Dynamic Averaging (FDA), a communication-efficient distributed deep learning method that reduces synchronization frequency based on model divergence, significantly lowering communication costs while maintaining performance.
Contribution
The paper proposes FDA, a novel dynamic synchronization strategy for distributed deep learning that adapts to model divergence, improving communication efficiency in federated settings.
Findings
FDA reduces communication costs by orders of magnitude.
FDA maintains robust performance across data heterogeneity.
FDA outperforms traditional and existing communication-efficient algorithms.
Abstract
The ever-growing volume and decentralized nature of data, coupled with the need to harness it and extract knowledge, have led to the extensive use of distributed deep learning (DDL) techniques for training. These techniques rely on local training performed at distributed nodes using locally collected data, followed by a periodic synchronization process that combines these models to create a unified global model. However, the frequent synchronization of deep learning models, encompassing millions to many billions of parameters, creates a communication bottleneck, severely hindering scalability. Worse yet, DDL algorithms typically waste valuable bandwidth and render themselves less practical in bandwidth-constrained federated settings by relying on overly simplistic, periodic, and rigid synchronization schedules. These inefficiencies make the training process increasingly impractical as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Brain Tumor Detection and Classification · Stochastic Gradient Optimization Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
