Embedding Principle: a hierarchical structure of loss landscape of deep   neural networks

Yaoyu Zhang; Yuqing Li; Zhongwang Zhang; Tao Luo; Zhi-Qin John Xu

arXiv:2111.15527·cs.LG·December 1, 2021

Embedding Principle: a hierarchical structure of loss landscape of deep neural networks

Yaoyu Zhang, Yuqing Li, Zhongwang Zhang, Tao Luo, Zhi-Qin John Xu

PDF

Open Access

TL;DR

This paper proves a hierarchical structure of the loss landscape in deep neural networks, showing how critical points of narrower networks embed into wider ones and explaining the prevalence of strict saddle points in wide NNs.

Contribution

It introduces the Embedding Principle, revealing a hierarchical structure of loss landscapes and critical points across different network widths, with implications for optimization.

Findings

01

Loss landscape of an NN contains all critical points of narrower NNs.

02

Number of negative/zero/positive eigenvalues can increase but not decrease with width.

03

Strict saddle points are common in wide NNs, aiding optimization.

Abstract

We prove a general Embedding Principle of loss landscape of deep neural networks (NNs) that unravels a hierarchical structure of the loss landscape of NNs, i.e., loss landscape of an NN contains all critical points of all the narrower NNs. This result is obtained by constructing a class of critical embeddings which map any critical point of a narrower NN to a critical point of the target NN with the same output function. By discovering a wide class of general compatible critical embeddings, we provide a gross estimate of the dimension of critical submanifolds embedded from critical points of narrower NNs. We further prove an irreversiblility property of any critical embedding that the number of negative/zero/positive eigenvalues of the Hessian matrix of a critical point may increase but never decrease as an NN becomes wider through the embedding. Using a special realization of general…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Sparse and Compressive Sensing Techniques