ResNets Are Deeper Than You Think
Christian H.X. Ali Mehmeti-G\"opel, Michael Wand

TL;DR
This paper argues that residual networks offer benefits beyond just easier training, as they operate in a different function space and have an inductive bias better suited for natural data, explaining their widespread success.
Contribution
It introduces the idea that residual networks inhabit a different function space than feedforward networks, providing a new perspective on their advantages beyond optimization.
Findings
Residual networks outperform fixed-depth networks in generalization.
Residual connections provide an inductive bias aligned with natural data.
Performance gains are not solely due to improved trainability.
Abstract
Residual connections remain ubiquitous in modern neural network architectures nearly a decade after their introduction. Their widespread adoption is often credited to their dramatically improved trainability: residual networks train faster, more stably, and achieve higher accuracy than their feedforward counterparts. While numerous techniques, ranging from improved initialization to advanced learning rate schedules, have been proposed to close the performance gap between residual and feedforward networks, this gap has persisted. In this work, we propose an alternative explanation: residual networks do not merely reparameterize feedforward networks, but instead inhabit a different function space. We design a controlled post-training comparison to isolate generalization performance from trainability; we find that variable-depth architectures, similar to ResNets, consistently outperform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning
