Identity Matters in Deep Learning

Moritz Hardt; Tengyu Ma

arXiv:1611.04231·cs.LG·July 23, 2018·76 cites

Identity Matters in Deep Learning

Moritz Hardt, Tengyu Ma

PDF

Open Access

TL;DR

This paper explores the importance of identity transformations in deep learning, providing theoretical insights into residual networks and demonstrating that simple residual architectures can achieve strong empirical results without normalization techniques.

Contribution

It offers a theoretical foundation for identity parameterization, proves the absence of spurious local optima in linear residual networks, and introduces a simple residual architecture that performs well on benchmarks.

Findings

01

Linear residual networks have no spurious local optima.

02

Residual networks with ReLU are universally expressive with enough parameters.

03

A simple residual convolutional network outperforms previous models on CIFAR and ImageNet.

Abstract

An emerging design principle in deep learning is that each layer of a deep artificial neural network should be able to easily express the identity transformation. This idea not only motivated various normalization techniques, such as \emph{batch normalization}, but was also key to the immense success of \emph{residual networks}. In this work, we put the principle of \emph{identity parameterization} on a more solid theoretical footing alongside further empirical progress. We first give a strikingly simple proof that arbitrarily deep linear residual networks have no spurious local optima. The same result for linear feed-forward networks in their standard parameterization is substantially more delicate. Second, we show that residual networks with ReLu activations have universal finite-sample expressivity in the sense that the network can represent any function of its sample provided that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

Methods*Communicated@Fast*How Do I Communicate to Expedia?