DynaComm: Accelerating Distributed CNN Training between Edges and Clouds   through Dynamic Communication Scheduling

Shangming Cai; Dongsheng Wang; Haixia Wang; Yongqiang Lyu; Guangquan; Xu; Xi Zheng; Athanasios V. Vasilakos

arXiv:2101.07968·cs.DC·October 11, 2021

DynaComm: Accelerating Distributed CNN Training between Edges and Clouds through Dynamic Communication Scheduling

Shangming Cai, Dongsheng Wang, Haixia Wang, Yongqiang Lyu, Guangquan, Xu, Xi Zheng, Athanasios V. Vasilakos

PDF

TL;DR

DynaComm is a dynamic scheduler that accelerates distributed CNN training at the network edge by optimizing communication and computation overlap, reducing training time without sacrificing accuracy.

Contribution

It introduces a novel dynamic communication scheduling method that decomposes transmission procedures for optimal layer-wise overlap during distributed CNN training.

Findings

01

Achieves optimal layer-wise scheduling compared to other strategies.

02

Reduces training time without affecting model accuracy.

03

Effective in edge-cloud distributed CNN training environments.

Abstract

To reduce uploading bandwidth and address privacy concerns, deep learning at the network edge has been an emerging topic. Typically, edge devices collaboratively train a shared model using real-time generated data through the Parameter Server framework. Although all the edge devices can share the computing workloads, the distributed training processes over edge networks are still time-consuming due to the parameters and gradients transmission procedures between parameter servers and edge devices. Focusing on accelerating distributed Convolutional Neural Networks (CNNs) training at the network edge, we present DynaComm, a novel scheduler that dynamically decomposes each transmission procedure into several segments to achieve optimal layer-wise communications and computations overlapping during run-time. Through experiments, we verify that DynaComm manages to achieve optimal layer-wise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.