Identity Matters in Deep Learning
Moritz Hardt, Tengyu Ma

TL;DR
This paper explores the importance of identity transformations in deep learning, providing theoretical insights into residual networks and demonstrating that simple residual architectures can achieve strong empirical results without normalization techniques.
Contribution
It offers a theoretical foundation for identity parameterization, proves the absence of spurious local optima in linear residual networks, and introduces a simple residual architecture that performs well on benchmarks.
Findings
Linear residual networks have no spurious local optima.
Residual networks with ReLU are universally expressive with enough parameters.
A simple residual convolutional network outperforms previous models on CIFAR and ImageNet.
Abstract
An emerging design principle in deep learning is that each layer of a deep artificial neural network should be able to easily express the identity transformation. This idea not only motivated various normalization techniques, such as \emph{batch normalization}, but was also key to the immense success of \emph{residual networks}. In this work, we put the principle of \emph{identity parameterization} on a more solid theoretical footing alongside further empirical progress. We first give a strikingly simple proof that arbitrarily deep linear residual networks have no spurious local optima. The same result for linear feed-forward networks in their standard parameterization is substantially more delicate. Second, we show that residual networks with ReLu activations have universal finite-sample expressivity in the sense that the network can represent any function of its sample provided that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia?
