Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability
Janis Keuper, Franz-Josef Pfreundt

TL;DR
This paper analyzes the scalability limits of distributed deep neural network training, revealing that current data-parallel SGD approaches face communication bottlenecks and theoretical constraints that hinder effective scaling beyond a few dozen nodes.
Contribution
It provides a theoretical framework and practical insights into the communication bottlenecks and scalability constraints of distributed DNN training.
Findings
Data-parallel SGD becomes communication-bound at scale.
Theoretical constraints limit effective scaling to a few dozen nodes.
Poor scalability observed in practical scenarios.
Abstract
This paper presents a theoretical analysis and practical evaluation of the main bottlenecks towards a scalable distributed solution for the training of Deep Neuronal Networks (DNNs). The presented results show, that the current state of the art approach, using data-parallelized Stochastic Gradient Descent (SGD), is quickly turning into a vastly communication bound problem. In addition, we present simple but fixed theoretic constraints, preventing effective scaling of DNN training beyond only a few dozen nodes. This leads to poor scalability of DNN training in most practical scenarios.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and ELM
