Improved Residual Networks for Image and Video Recognition

Ionut Cosmin Duta; Li Liu; Fan Zhu; Ling Shao

arXiv:2004.04989·cs.CV·April 13, 2020·25 cites

Improved Residual Networks for Image and Video Recognition

Ionut Cosmin Duta, Li Liu, Fan Zhu, Ling Shao

PDF

Open Access 2 Repos

TL;DR

This paper introduces an improved residual network architecture that enhances accuracy and training depth without increasing complexity, demonstrating significant gains across multiple image and video recognition tasks.

Contribution

The authors propose modifications to all three main components of ResNets, enabling training of much deeper networks with better accuracy and convergence.

Findings

01

Achieved up to 2% accuracy improvement on ImageNet with 50-layer ResNet.

02

Successfully trained a 404-layer CNN on ImageNet, surpassing baseline limitations.

03

Demonstrated improvements across image classification, object detection, and video recognition datasets.

Abstract

Residual networks (ResNets) represent a powerful type of convolutional neural network (CNN) architecture, widely adopted and used in various tasks. In this work we propose an improved version of ResNets. Our proposed improvements address all three main components of a ResNet: the flow of information through the network layers, the residual building block, and the projection shortcut. We are able to show consistent improvements in accuracy and learning convergence over the baseline. For instance, on ImageNet dataset, using the ResNet with 50 layers, for top-1 accuracy we can report a 1.19% improvement over the baseline in one setting and around 2% boost in another. Importantly, these improvements are obtained without increasing the model complexity. Our proposed approach allows us to train extremely deep networks, while the baseline shows severe optimization issues. We report results on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning

MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Residual Connection