Revisiting ResNets: Improved Training and Scaling Strategies
Irwan Bello, William Fedus, Xianzhi Du, Ekin D. Cubuk, Aravind, Srinivas, Tsung-Yi Lin, Jonathon Shlens, Barret Zoph

TL;DR
This paper revisits ResNet architectures, demonstrating that training and scaling strategies are more impactful than architecture changes, leading to faster, high-performing models suitable as strong baselines.
Contribution
It introduces new scaling strategies and a family of ResNet-RS architectures that outperform or match state-of-the-art models in speed and accuracy.
Findings
ResNet training and scaling strategies are more influential than architecture modifications.
ResNet-RS models are 1.7x - 2.7x faster than EfficientNets on TPUs with similar accuracy.
ResNet-RS achieves 86.2% top-1 accuracy on ImageNet and improves transfer learning performance.
Abstract
Novel computer vision architectures monopolize the spotlight, but the impact of the model architecture is often conflated with simultaneous changes to training methodology and scaling strategies. Our work revisits the canonical ResNet (He et al., 2015) and studies these three aspects in an effort to disentangle them. Perhaps surprisingly, we find that training and scaling strategies may matter more than architectural changes, and further, that the resulting ResNets match recent state-of-the-art models. We show that the best performing scaling strategy depends on the training regime and offer two new scaling strategies: (1) scale model depth in regimes where overfitting can occur (width scaling is preferable otherwise); (2) increase image resolution more slowly than previously recommended (Tan & Le, 2019). Using improved training and scaling strategies, we design a family of ResNet…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗kadirnar/timm_model_listmodel· ♡ 1♡ 1
- 🤗timm/resnetrs50.tf_in1kmodel· 656 dl656 dl
- 🤗timm/resnetrs101.tf_in1kmodel· 189 dl189 dl
- 🤗timm/resnetrs152.tf_in1kmodel· 187 dl187 dl
- 🤗timm/resnetrs200.tf_in1kmodel· 153 dl153 dl
- 🤗timm/resnetrs270.tf_in1kmodel· 129 dl129 dl
- 🤗timm/resnetrs350.tf_in1kmodel· 108 dl108 dl
- 🤗timm/resnetrs420.tf_in1kmodel· 232 dl232 dl
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsXavier Initialization · ResNet-D · Weight Decay · Label Smoothing · Cosine Annealing · Stochastic Depth · ResNet-RS · Depthwise Convolution · Pointwise Convolution · Average Pooling
