Theory II: Landscape of the Empirical Risk in Deep Learning

Qianli Liao; Tomaso Poggio

arXiv:1703.09833·cs.LG·June 23, 2017·47 cites

Theory II: Landscape of the Empirical Risk in Deep Learning

Qianli Liao, Tomaso Poggio

PDF

Open Access

TL;DR

This paper combines theory and experiments to analyze the landscape of empirical risk in overparametrized deep convolutional neural networks, revealing many degenerate global minima and suggesting the loss surface may be simpler than previously thought.

Contribution

It characterizes the empirical risk landscape of overparametrized DCNNs, proving the existence of many degenerate global minima and visualizing the training process.

Findings

01

Existence of numerous degenerate global minima with zero empirical error.

02

The empirical risk landscape can be simpler than traditionally believed.

03

SGD tends to find the most robust zero-minimizer.

Abstract

Previous theoretical work on deep learning and neural network optimization tend to focus on avoiding saddle points and local minima. However, the practical observation is that, at least in the case of the most successful Deep Convolutional Neural Networks (DCNNs), practitioners can always increase the network size to fit the training data (an extreme example would be [1]). The most successful DCNNs such as VGG and ResNets are best used with a degree of "overparametrization". In this work, we characterize with a mix of theory and experiments, the landscape of the empirical risk of overparametrized DCNNs. We first prove in the regression framework the existence of a large number of degenerate global minimizers with zero empirical error (modulo inconsistent equations). The argument that relies on the use of Bezout theorem is rigorous when the RELUs are replaced by a polynomial nonlinearity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms

MethodsDropout · Dense Connections · *Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Softmax · Convolution · Ethereum Customer Service Number +1-833-534-1729 · Stochastic Gradient Descent