Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1   Accuracy in One Hour

Arissa Wongpanich; Hieu Pham; James Demmel; Mingxing Tan; Quoc Le,; Yang You; Sameer Kumar

arXiv:2011.00071·cs.LG·November 6, 2020

Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour

Arissa Wongpanich, Hieu Pham, James Demmel, Mingxing Tan, Quoc Le,, Yang You, Sameer Kumar

PDF

TL;DR

This paper demonstrates how to efficiently scale up EfficientNet training on supercomputers, achieving 83% ImageNet accuracy in just over an hour by optimizing batch size, learning rates, and distributed evaluation.

Contribution

The paper introduces optimization techniques for large-scale EfficientNet training on TPU clusters, enabling rapid training with high accuracy in a fraction of previous time.

Findings

01

Achieved 83% ImageNet Top-1 accuracy in 1 hour and 4 minutes.

02

Optimized training with large batch sizes and advanced learning rate schedules.

03

Provided performance benchmarks for EfficientNets at supercomputer scale.

Abstract

EfficientNets are a family of state-of-the-art image classification models based on efficiently scaled convolutional neural networks. Currently, EfficientNets can take on the order of days to train; for example, training an EfficientNet-B0 model takes 23 hours on a Cloud TPU v2-8 node. In this paper, we explore techniques to scale up the training of EfficientNets on TPU-v3 Pods with 2048 cores, motivated by speedups that can be achieved when training at such scales. We discuss optimizations required to scale training to a batch size of 65536 on 1024 TPU-v3 cores, such as selecting large batch optimizers and learning rate schedules as well as utilizing distributed evaluation and batch normalization techniques. Additionally, we present timing and performance benchmarks for EfficientNet models trained on the ImageNet dataset in order to analyze the behavior of EfficientNets at scale. With…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDepthwise Convolution · Pointwise Convolution · Depthwise Separable Convolution · Sigmoid Activation · Dropout · Inverted Residual Block · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Dense Connections · Squeeze-and-Excitation Block