Field theory for optimal signal propagation in ResNets
Kirsten Fischer, David Dahmen, Moritz Helias

TL;DR
This paper develops a finite-size field theory for ResNets to understand how the residual scaling parameter affects signal propagation, revealing its near-universal optimal value and its relation to network sensitivity.
Contribution
It introduces a systematic finite-size field theory for ResNets, deriving analytical expressions for the response function and optimal residual scaling, explaining empirical observations.
Findings
Optimal scaling parameter lies within the range of maximal sensitivity.
The optimal scaling depends weakly on other hyperparameters.
The theory explains the universality of the residual scaling choice.
Abstract
Residual networks have significantly better trainability and thus performance than feed-forward networks at large depth. Introducing skip connections facilitates signal propagation to deeper layers. In addition, previous works found that adding a scaling parameter for the residual branch further improves generalization performance. While they empirically identified a particularly beneficial range of values for this scaling parameter, the associated performance improvement and its universality across network hyperparameters yet need to be understood. For feed-forward networks, finite-size theories have led to important insights with regard to signal propagation and hyperparameter tuning. We here derive a systematic finite-size field theory for residual networks to study signal propagation and its dependence on the scaling for the residual branch. We derive analytical expressions for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsUltrasonics and Acoustic Wave Propagation · Underwater Acoustics Research · Geophysical Methods and Applications
