Identity Mappings in Deep Residual Networks
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

TL;DR
This paper analyzes the role of identity mappings in deep residual networks, demonstrating their importance for signal propagation and proposing a new residual unit that enhances training and generalization, leading to improved results on multiple datasets.
Contribution
The paper introduces a new residual unit leveraging identity mappings, improving training ease and generalization in very deep ResNets.
Findings
Improved accuracy with 1001-layer ResNet on CIFAR-10 and CIFAR-100
Enhanced training stability and convergence
Successful deployment of 200-layer ResNet on ImageNet
Abstract
Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation. A series of ablation experiments support the importance of these identity mappings. This motivates us to propose a new residual unit, which makes training easier and improves generalization. We report improved results using a 1001-layer ResNet on CIFAR-10 (4.62% error) and CIFAR-100, and a 200-layer ResNet on ImageNet. Code is available at: https://github.com/KaimingHe/resnet-1k-layers
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗facebook/esm1b_t33_650M_UR50Smodel· 6.8k dl· ♡ 226.8k dl♡ 22
- 🤗timm/resnetv2_50.a1h_in1kmodel· 1.1k dl1.1k dl
- 🤗timm/resnetv2_50d_evos.ah_in1kmodel· 107 dl· ♡ 1107 dl♡ 1
- 🤗timm/resnetv2_50d_gn.ah_in1kmodel· 473 dl473 dl
- 🤗timm/resnetv2_50x1_bit.goog_distilled_in1kmodel· 661 dl661 dl
- 🤗timm/resnetv2_50x1_bit.goog_in21kmodel· 4.2k dl· ♡ 54.2k dl♡ 5
- 🤗timm/resnetv2_50x1_bit.goog_in21k_ft_in1kmodel· 2.4k dl2.4k dl
- 🤗timm/resnetv2_50x3_bit.goog_in21kmodel· 118 dl118 dl
- 🤗timm/resnetv2_50x3_bit.goog_in21k_ft_in1kmodel· 108 dl· ♡ 1108 dl♡ 1
- 🤗timm/resnetv2_101.a1h_in1kmodel· 607 dl607 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Geophysical Methods and Applications · Adversarial Robustness in Machine Learning
MethodsAverage Pooling · Affine Coupling · Normalizing Flows · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Random Resized Crop · Random Horizontal Flip · Step Decay · SGD with Momentum
