Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7 seconds
Masafumi Yamazaki, Akihiko Kasagi, Akihiro Tabuchi, Takumi Honda,, Masahiro Miwa, Naoto Fukumoto, Tsuguchika Tabaru, Atsushi Ike, Kohta, Nakashima

TL;DR
This paper presents a novel optimization approach enabling ResNet-50 to be trained on ImageNet in just 74.7 seconds using 2,048 GPUs, significantly advancing the speed of deep learning training.
Contribution
It introduces new optimization methods that enable ultra-fast distributed training of deep neural networks on large GPU clusters.
Findings
Training time of 74.7 seconds for ResNet-50 on ImageNet
Achieved training throughput of over 1.73 million images/sec
Top-1 validation accuracy of 75.08%
Abstract
There has been a strong demand for algorithms that can execute machine learning as faster as possible and the speed of deep learning has accelerated by 30 times only in the past two years. Distributed deep learning using the large mini-batch is a key technology to address the demand and is a great challenge as it is difficult to achieve high scalability on large clusters without compromising accuracy. In this paper, we introduce optimization methods which we applied to this challenge. We achieved the training time of 74.7 seconds using 2,048 GPUs on ABCI cluster applying these methods. The training throughput is over 1.73 million images/sec and the top-1 validation accuracy is 75.08%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Medical Image Segmentation Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
