Residual Connections Encourage Iterative Inference

Stanis{\l}aw Jastrz\k{e}bski; Devansh Arpit; Nicolas Ballas; Vikas; Verma; Tong Che; Yoshua Bengio

arXiv:1710.04773·cs.CV·March 9, 2018·43 cites

Residual Connections Encourage Iterative Inference

Stanis{\l}aw Jastrz\k{e}bski, Devansh Arpit, Nicolas Ballas, Vikas, Verma, Tong Che, Yoshua Bengio

PDF

Open Access

TL;DR

This paper investigates how residual connections in Resnets facilitate iterative feature refinement, revealing their dual role in representation learning and iterative improvement, with insights into layer behaviors and strategies to prevent overfitting.

Contribution

The paper formalizes the concept of iterative refinement in Resnets and provides analytical and empirical evidence of how residual connections promote this process.

Findings

01

Resnets encourage features to move along the negative gradient of loss.

02

Early layers focus on representation learning, higher layers perform iterative refinement.

03

Naive sharing of residual layers can cause overfitting, but simple strategies mitigate this.

Abstract

Residual networks (Resnets) have become a prominent architecture in deep learning. However, a comprehensive understanding of Resnets is still a topic of ongoing research. A recent view argues that Resnets perform iterative refinement of features. We attempt to further expose properties of this aspect. To this end, we study Resnets both analytically and empirically. We formalize the notion of iterative refinement in Resnets by showing that residual connections naturally encourage features of residual blocks to move along the negative gradient of loss as we go from one block to the next. In addition, our empirical analysis suggests that Resnets are able to perform both representation learning and iterative refinement. In general, a Resnet block tends to concentrate representation learning behavior in the first few layers while higher layers perform iterative refinement of features.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications

MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Residual Connection