A Survey and Empirical Evaluation of Parallel Deep Learning Frameworks

Daniel Nichols; Siddharth Singh; Shu-Huai Lin; Abhinav Bhatele

arXiv:2111.04949·cs.LG·July 4, 2022·5 cites

A Survey and Empirical Evaluation of Parallel Deep Learning Frameworks

Daniel Nichols, Siddharth Singh, Shu-Huai Lin, Abhinav Bhatele

PDF

Open Access

TL;DR

This paper surveys and empirically evaluates current distributed deep learning frameworks, analyzing their performance, efficiency, and memory use on large-scale image and language tasks to identify bottlenecks and areas for improvement.

Contribution

It provides a comprehensive comparison of state-of-the-art distributed deep learning frameworks through empirical testing and analysis of their performance and efficiency.

Findings

01

Performance varies significantly across frameworks.

02

Memory consumption and statistical efficiency differ among methods.

03

Identifies key bottlenecks limiting scalability.

Abstract

The field of deep learning has witnessed a remarkable shift towards extremely compute- and memory-intensive neural networks. These newer larger models have enabled researchers to advance state-of-the-art tools across a variety of fields. This phenomenon has spurred the development of algorithms for distributed training of neural networks over a larger number of hardware accelerators. In this paper, we discuss and compare current state-of-the-art frameworks for large scale distributed deep learning. First, we survey current practices in distributed learning and identify the different types of parallelism used. Then, we present empirical results comparing their performance on large image and language training tasks. Additionally, we address their statistical efficiency and memory consumption behavior. Based on our results, we discuss algorithmic and implementation portions of each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · COVID-19 diagnosis using AI