Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train
Valeriu Codreanu, Damian Podareanu, and Vikram Saletore

TL;DR
This paper demonstrates scalable training of ResNet-50 on large supercomputers, achieving high accuracy and significantly reduced training time through novel techniques and software tools.
Contribution
It introduces a scalable training methodology for ResNet-50 on supercomputers, achieving high efficiency and accuracy with novel ensemble techniques.
Findings
Over 90% scaling efficiency up to 104K cores
Training time reduced to 28 minutes for ResNet-50
Achieved 77.5% top-1 accuracy with Collapsed Ensemble
Abstract
For the past 5 years, the ILSVRC competition and the ImageNet dataset have attracted a lot of interest from the Computer Vision community, allowing for state-of-the-art accuracy to grow tremendously. This should be credited to the use of deep artificial neural network designs. As these became more complex, the storage, bandwidth, and compute requirements increased. This means that with a non-distributed approach, even when using the most high-density server available, the training process may take weeks, making it prohibitive. Furthermore, as datasets grow, the representation learning potential of deep networks grows as well by using more complex models. This synchronicity triggers a sharp increase in the computational requirements and motivates us to explore the scaling behaviour on petaflop scale supercomputers. In this paper we will describe the challenges and novel solutions needed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
