Revisiting ResNets: Improved Training and Scaling Strategies

Irwan Bello; William Fedus; Xianzhi Du; Ekin D. Cubuk; Aravind; Srinivas; Tsung-Yi Lin; Jonathon Shlens; Barret Zoph

arXiv:2103.07579·cs.CV·March 16, 2021·210 cites

Revisiting ResNets: Improved Training and Scaling Strategies

Irwan Bello, William Fedus, Xianzhi Du, Ekin D. Cubuk, Aravind, Srinivas, Tsung-Yi Lin, Jonathon Shlens, Barret Zoph

PDF

Open Access 3 Repos 8 Models 1 Video

TL;DR

This paper revisits ResNet architectures, demonstrating that training and scaling strategies are more impactful than architecture changes, leading to faster, high-performing models suitable as strong baselines.

Contribution

It introduces new scaling strategies and a family of ResNet-RS architectures that outperform or match state-of-the-art models in speed and accuracy.

Findings

01

ResNet training and scaling strategies are more influential than architecture modifications.

02

ResNet-RS models are 1.7x - 2.7x faster than EfficientNets on TPUs with similar accuracy.

03

ResNet-RS achieves 86.2% top-1 accuracy on ImageNet and improves transfer learning performance.

Abstract

Novel computer vision architectures monopolize the spotlight, but the impact of the model architecture is often conflated with simultaneous changes to training methodology and scaling strategies. Our work revisits the canonical ResNet (He et al., 2015) and studies these three aspects in an effort to disentangle them. Perhaps surprisingly, we find that training and scaling strategies may matter more than architectural changes, and further, that the resulting ResNets match recent state-of-the-art models. We show that the best performing scaling strategy depends on the training regime and offer two new scaling strategies: (1) scale model depth in regimes where overfitting can occur (width scaling is preferable otherwise); (2) increase image resolution more slowly than previously recommended (Tan & Le, 2019). Using improved training and scaling strategies, we design a family of ResNet…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

Revisiting ResNets: Improved Training and Scaling Strategies· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning

MethodsXavier Initialization · ResNet-D · Weight Decay · Label Smoothing · Cosine Annealing · Stochastic Depth · ResNet-RS · Depthwise Convolution · Pointwise Convolution · Average Pooling