Distributed Training of Deep Learning Models: A Taxonomic Perspective

Matthias Langer; Zhen He; Wenny Rahayu; and Yanbo Xue

arXiv:2007.03970·cs.DC·July 9, 2020

Distributed Training of Deep Learning Models: A Taxonomic Perspective

Matthias Langer, Zhen He, Wenny Rahayu, and Yanbo Xue

PDF

TL;DR

This paper provides a comprehensive taxonomy of distributed deep learning systems, analyzing their fundamental principles, techniques, and implications to facilitate understanding and comparison of various approaches in distributed training.

Contribution

It introduces a taxonomy categorizing distributed deep learning systems based on their techniques and principles, aiding in systematic analysis and comparison.

Findings

01

Identifies key principles underlying distributed deep learning

02

Categorizes existing techniques into a coherent taxonomy

03

Highlights implications of different approaches on training efficiency

Abstract

Distributed deep learning systems (DDLS) train deep neural network models by utilizing the distributed resources of a cluster. Developers of DDLS are required to make many decisions to process their particular workloads in their chosen environment efficiently. The advent of GPU-based deep learning, the ever-increasing size of datasets and deep neural network models, in combination with the bandwidth constraints that exist in cluster environments require developers of DDLS to be innovative in order to train high quality models quickly. Comparing DDLS side-by-side is difficult due to their extensive feature lists and architectural deviations. We aim to shine some light on the fundamental principles that are at work when training deep neural networks in a cluster of independent machines by analyzing the general properties associated with training deep learning models and how such workloads…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.