Residual Alignment: Uncovering the Mechanisms of Residual Networks
Jianing Li, Vardan Papyan

TL;DR
This paper empirically investigates the mechanisms behind ResNet's success, revealing a phenomenon called Residual Alignment that correlates with good generalization and involves geometric and spectral properties of residual Jacobians.
Contribution
It introduces the concept of Residual Alignment, characterizes its properties, and demonstrates its occurrence across various architectures and datasets, providing insights into ResNet's effectiveness.
Findings
Residual Alignment occurs in well-generalizing models
Residual Jacobians are low-rank and their singular vectors align
Residual Jacobian singular values scale inversely with depth
Abstract
The ResNet architecture has been widely adopted in deep learning due to its significant boost to performance through the use of simple skip connections, yet the underlying mechanisms leading to its success remain largely unknown. In this paper, we conduct a thorough empirical study of the ResNet architecture in classification tasks by linearizing its constituent residual blocks using Residual Jacobians and measuring their singular value decompositions. Our measurements reveal a process called Residual Alignment (RA) characterized by four properties: (RA1) intermediate representations of a given input are equispaced on a line, embedded in high dimensional space, as observed by Gai and Zhang [2021]; (RA2) top left and right singular vectors of Residual Jacobians align with each other and across different depths; (RA3) Residual Jacobians are at most rank C for fully-connected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Machine Learning in Materials Science · Integrated Circuits and Semiconductor Failure Analysis
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Average Pooling · Residual Connection · 1x1 Convolution · Global Average Pooling · Max Pooling · Convolution · Bottleneck Residual Block · Kaiming Initialization
