ResNets Are Deeper Than You Think

Christian H.X. Ali Mehmeti-G\"opel; Michael Wand

arXiv:2506.14386·cs.LG·June 18, 2025

ResNets Are Deeper Than You Think

Christian H.X. Ali Mehmeti-G\"opel, Michael Wand

PDF

Open Access

TL;DR

This paper argues that residual networks offer benefits beyond just easier training, as they operate in a different function space and have an inductive bias better suited for natural data, explaining their widespread success.

Contribution

It introduces the idea that residual networks inhabit a different function space than feedforward networks, providing a new perspective on their advantages beyond optimization.

Findings

01

Residual networks outperform fixed-depth networks in generalization.

02

Residual connections provide an inductive bias aligned with natural data.

03

Performance gains are not solely due to improved trainability.

Abstract

Residual connections remain ubiquitous in modern neural network architectures nearly a decade after their introduction. Their widespread adoption is often credited to their dramatically improved trainability: residual networks train faster, more stably, and achieve higher accuracy than their feedforward counterparts. While numerous techniques, ranging from improved initialization to advanced learning rate schedules, have been proposed to close the performance gap between residual and feedforward networks, this gap has persisted. In this work, we propose an alternative explanation: residual networks do not merely reparameterize feedforward networks, but instead inhabit a different function space. We design a controlled post-training comparison to isolate generalization performance from trainability; we find that variable-depth architectures, similar to ResNets, consistently outperform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning