Distributed Training of Deep Neural Networks with Theoretical Analysis: Under SSP Setting
Abhimanu Kumar, Pengtao Xie, Junming Yin, Eric P. Xing

TL;DR
This paper introduces a distributed training method for deep neural networks that guarantees convergence and demonstrates near-linear scalability, significantly reducing training time on large datasets like ImageNet.
Contribution
It presents a theoretically guaranteed, scalable distributed training scheme for DNNs that converges to the same optima as traditional training, with empirical validation across multiple datasets.
Findings
Achieves close to 6x speedup on ImageNet with 6 machines.
Proves convergence to the same optima as non-distributed training.
Provides novel insights into layerwise and probabilistic convergence.
Abstract
We propose a distributed approach to train deep neural networks (DNNs), which has guaranteed convergence theoretically and great scalability empirically: close to 6 times faster on instance of ImageNet data set when run with 6 machines. The proposed scheme is close to optimally scalable in terms of number of machines, and guaranteed to converge to the same optima as the undistributed setting. The convergence and scalability of the distributed setting is shown empirically across different datasets (TIMIT and ImageNet) and machine learning tasks (image classification and phoneme extraction). The convergence analysis provides novel insights into this complex learning scheme, including: 1) layerwise convergence, and 2) convergence of the weights in probability.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and Algorithms · Adversarial Robustness in Machine Learning
