ImageNet Training in Minutes
Yang You, Zhao Zhang, Cho-Jui Hsieh, James Demmel, Kurt Keutzer

TL;DR
This paper demonstrates that large-batch synchronous SGD with LARS enables training ResNet-50 on ImageNet in just 14 minutes without losing accuracy, significantly accelerating deep learning training times.
Contribution
The authors introduce a large-batch training approach using LARS that achieves near real-time ImageNet training speeds on large-scale hardware without sacrificing accuracy.
Findings
ResNet-50 training completed in 14 minutes with 74.9% accuracy.
AlexNet trained in 11 minutes on 1024 CPUs.
Large batch sizes above 16K improve accuracy over previous methods.
Abstract
Finishing 90-epoch ImageNet-1k training with ResNet-50 on a NVIDIA M40 GPU takes 14 days. This training requires 10^18 single precision operations in total. On the other hand, the world's current fastest supercomputer can finish 2 * 10^17 single precision operations per second (Dongarra et al 2017, https://www.top500.org/lists/2017/06/). If we can make full use of the supercomputer for DNN training, we should be able to finish the 90-epoch ResNet-50 training in one minute. However, the current bottleneck for fast DNN training is in the algorithm level. Specifically, the current batch size (e.g. 512) is too small to make efficient use of many processors. For large-scale DNN training, we focus on using large-batch data-parallelism synchronous SGD without losing accuracy in the fixed epochs. The LARS algorithm (You, Gitman, Ginsburg, 2017, arXiv:1708.03888) enables us to scale the batch…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Medical Image Segmentation Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · LARS
