Highway and Residual Networks learn Unrolled Iterative Estimation
Klaus Greff, Rupesh K. Srivastava, J\"urgen Schmidhuber

TL;DR
This paper proposes that Highway and Residual networks function as unrolled iterative estimators, refining features through successive layers rather than creating new representations, challenging traditional hierarchical deep learning views.
Contribution
It introduces the unrolled iterative estimation perspective as a unifying explanation for Highway and Residual networks' success, providing new insights into their operation.
Findings
Highway and Residual networks can be viewed as iterative refinement processes.
This perspective explains their ability to train very deep networks.
Preliminary experiments reveal similarities and differences between the architectures.
Abstract
The past year saw the introduction of new architectures such as Highway networks and Residual networks which, for the first time, enabled the training of feedforward networks with dozens to hundreds of layers using simple gradient descent. While depth of representation has been posited as a primary reason for their success, there are indications that these architectures defy a popular view of deep learning as a hierarchical computation of increasingly abstract features at each layer. In this report, we argue that this view is incomplete and does not adequately explain several recent findings. We propose an alternative viewpoint based on unrolled iterative estimation -- a group of successive layers iteratively refine their estimates of the same features instead of computing an entirely new representation. We demonstrate that this viewpoint directly leads to the construction of Highway…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Anomaly Detection Techniques and Applications · Multidisciplinary Science and Engineering Research
