Characterization of Gradient Dominance and Regularity Conditions for   Neural Networks

Yi Zhou; Yingbin Liang

arXiv:1710.06910·stat.ML·October 23, 2017·27 cites

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks

Yi Zhou, Yingbin Liang

PDF

Open Access

TL;DR

This paper analyzes the loss landscape of certain neural networks, providing explicit characterizations of global minimizers and establishing gradient dominance and regularity conditions to better understand optimization behavior.

Contribution

It offers explicit descriptions of global minimizers and proves gradient dominance and regularity conditions for the loss landscape of specific neural network models.

Findings

01

Explicit characterization of global minimizers for linear and nonlinear networks.

02

Proof of gradient dominance condition near global minimizers.

03

Establishment of regularity conditions along certain directions.

Abstract

The past decade has witnessed a successful application of deep learning to solving many challenging problems in machine learning and artificial intelligence. However, the loss functions of deep neural networks (especially nonlinear networks) are still far from being well understood from a theoretical aspect. In this paper, we enrich the current understanding of the landscape of the square loss functions for three types of neural networks. Specifically, when the parameter matrices are square, we provide an explicit characterization of the global minimizers for linear networks, linear residual networks, and nonlinear networks with one hidden layer. Then, we establish two quadratic types of landscape properties for the square loss of these neural networks, i.e., the gradient dominance condition within the neighborhood of their full rank global minimizers, and the regularity condition along…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Matrix Theory and Algorithms