Improved Residual Networks for Image and Video Recognition
Ionut Cosmin Duta, Li Liu, Fan Zhu, Ling Shao

TL;DR
This paper introduces an improved residual network architecture that enhances accuracy and training depth without increasing complexity, demonstrating significant gains across multiple image and video recognition tasks.
Contribution
The authors propose modifications to all three main components of ResNets, enabling training of much deeper networks with better accuracy and convergence.
Findings
Achieved up to 2% accuracy improvement on ImageNet with 50-layer ResNet.
Successfully trained a 404-layer CNN on ImageNet, surpassing baseline limitations.
Demonstrated improvements across image classification, object detection, and video recognition datasets.
Abstract
Residual networks (ResNets) represent a powerful type of convolutional neural network (CNN) architecture, widely adopted and used in various tasks. In this work we propose an improved version of ResNets. Our proposed improvements address all three main components of a ResNet: the flow of information through the network layers, the residual building block, and the projection shortcut. We are able to show consistent improvements in accuracy and learning convergence over the baseline. For instance, on ImageNet dataset, using the ResNet with 50 layers, for top-1 accuracy we can report a 1.19% improvement over the baseline in one setting and around 2% boost in another. Importantly, these improvements are obtained without increasing the model complexity. Our proposed approach allows us to train extremely deep networks, while the baseline shows severe optimization issues. We report results on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Residual Connection
