Distributed Training of Deep Neural Networks: Theoretical and Practical   Limits of Parallel Scalability

Janis Keuper; Franz-Josef Pfreundt

arXiv:1609.06870·cs.CV·December 6, 2016

Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability

Janis Keuper, Franz-Josef Pfreundt

PDF

Open Access

TL;DR

This paper analyzes the scalability limits of distributed deep neural network training, revealing that current data-parallel SGD approaches face communication bottlenecks and theoretical constraints that hinder effective scaling beyond a few dozen nodes.

Contribution

It provides a theoretical framework and practical insights into the communication bottlenecks and scalability constraints of distributed DNN training.

Findings

01

Data-parallel SGD becomes communication-bound at scale.

02

Theoretical constraints limit effective scaling to a few dozen nodes.

03

Poor scalability observed in practical scenarios.

Abstract

This paper presents a theoretical analysis and practical evaluation of the main bottlenecks towards a scalable distributed solution for the training of Deep Neuronal Networks (DNNs). The presented results show, that the current state of the art approach, using data-parallelized Stochastic Gradient Descent (SGD), is quickly turning into a vastly communication bound problem. In addition, we present simple but fixed theoretic constraints, preventing effective scaling of DNN training beyond only a few dozen nodes. This leads to poor scalability of DNN training in most practical scenarios.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and ELM