Characterization of Gradient Dominance and Regularity Conditions for Neural Networks
Yi Zhou, Yingbin Liang

TL;DR
This paper analyzes the loss landscape of certain neural networks, providing explicit characterizations of global minimizers and establishing gradient dominance and regularity conditions to better understand optimization behavior.
Contribution
It offers explicit descriptions of global minimizers and proves gradient dominance and regularity conditions for the loss landscape of specific neural network models.
Findings
Explicit characterization of global minimizers for linear and nonlinear networks.
Proof of gradient dominance condition near global minimizers.
Establishment of regularity conditions along certain directions.
Abstract
The past decade has witnessed a successful application of deep learning to solving many challenging problems in machine learning and artificial intelligence. However, the loss functions of deep neural networks (especially nonlinear networks) are still far from being well understood from a theoretical aspect. In this paper, we enrich the current understanding of the landscape of the square loss functions for three types of neural networks. Specifically, when the parameter matrices are square, we provide an explicit characterization of the global minimizers for linear networks, linear residual networks, and nonlinear networks with one hidden layer. Then, we establish two quadratic types of landscape properties for the square loss of these neural networks, i.e., the gradient dominance condition within the neighborhood of their full rank global minimizers, and the regularity condition along…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Matrix Theory and Algorithms
