Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes
Takuya Akiba, Shuji Suzuki, Keisuke Fukuda

TL;DR
This paper presents a method to train ResNet-50 on ImageNet in just 15 minutes using extremely large minibatches and specialized techniques to maintain accuracy, demonstrating significant speedup in deep learning training.
Contribution
The paper introduces a successful approach for training large-scale neural networks rapidly with very large minibatches, combining hardware, software, and training techniques.
Findings
ResNet-50 training time reduced to 15 minutes
Achieved high accuracy with large minibatch size of 32k
Detailed system setup enabling extreme speedup
Abstract
We demonstrate that training ResNet-50 on ImageNet for 90 epochs can be achieved in 15 minutes with 1024 Tesla P100 GPUs. This was made possible by using a large minibatch size of 32k. To maintain accuracy with this large minibatch size, we employed several techniques such as RMSprop warm-up, batch normalization without moving averages, and a slow-start learning rate schedule. This paper also describes the details of the hardware and software of the system used to achieve the above performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsRMSProp
