Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15   Minutes

Takuya Akiba; Shuji Suzuki; Keisuke Fukuda

arXiv:1711.04325·cs.DC·November 15, 2017·282 cites

Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes

Takuya Akiba, Shuji Suzuki, Keisuke Fukuda

PDF

Open Access

TL;DR

This paper presents a method to train ResNet-50 on ImageNet in just 15 minutes using extremely large minibatches and specialized techniques to maintain accuracy, demonstrating significant speedup in deep learning training.

Contribution

The paper introduces a successful approach for training large-scale neural networks rapidly with very large minibatches, combining hardware, software, and training techniques.

Findings

01

ResNet-50 training time reduced to 15 minutes

02

Achieved high accuracy with large minibatch size of 32k

03

Detailed system setup enabling extreme speedup

Abstract

We demonstrate that training ResNet-50 on ImageNet for 90 epochs can be achieved in 15 minutes with 1024 Tesla P100 GPUs. This was made possible by using a large minibatch size of 32k. To maintain accuracy with this large minibatch size, we employed several techniques such as RMSprop warm-up, batch normalization without moving averages, and a slow-start learning rate schedule. This paper also describes the details of the hardware and software of the system used to achieve the above performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsRMSProp