Residual Connections Encourage Iterative Inference
Stanis{\l}aw Jastrz\k{e}bski, Devansh Arpit, Nicolas Ballas, Vikas, Verma, Tong Che, Yoshua Bengio

TL;DR
This paper investigates how residual connections in Resnets facilitate iterative feature refinement, revealing their dual role in representation learning and iterative improvement, with insights into layer behaviors and strategies to prevent overfitting.
Contribution
The paper formalizes the concept of iterative refinement in Resnets and provides analytical and empirical evidence of how residual connections promote this process.
Findings
Resnets encourage features to move along the negative gradient of loss.
Early layers focus on representation learning, higher layers perform iterative refinement.
Naive sharing of residual layers can cause overfitting, but simple strategies mitigate this.
Abstract
Residual networks (Resnets) have become a prominent architecture in deep learning. However, a comprehensive understanding of Resnets is still a topic of ongoing research. A recent view argues that Resnets perform iterative refinement of features. We attempt to further expose properties of this aspect. To this end, we study Resnets both analytically and empirically. We formalize the notion of iterative refinement in Resnets by showing that residual connections naturally encourage features of residual blocks to move along the negative gradient of loss as we go from one block to the next. In addition, our empirical analysis suggests that Resnets are able to perform both representation learning and iterative refinement. In general, a Resnet block tends to concentrate representation learning behavior in the first few layers while higher layers perform iterative refinement of features.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications
MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Residual Connection
