Embedding Principle in Depth for the Loss Landscape Analysis of Deep   Neural Networks

Zhiwei Bai; Tao Luo; Zhi-Qin John Xu; Yaoyu Zhang

arXiv:2205.13283·cs.LG·April 15, 2025

Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks

Zhiwei Bai, Tao Luo, Zhi-Qin John Xu, Yaoyu Zhang

PDF

Open Access

TL;DR

This paper introduces an embedding principle in depth that reveals how the loss landscape of deep neural networks contains all critical points of shallower networks, providing new insights into training dynamics and the effects of batch normalization.

Contribution

The work proposes a critical lifting operator and uncovers an embedding principle that links the loss landscapes of networks of different depths, advancing theoretical understanding of deep learning.

Findings

01

Local minima can be lifted to saddle points in deeper networks

02

Batch normalization suppresses critical manifolds from shallower networks

03

Increasing training data shrinks critical manifolds, speeding up training

Abstract

Understanding the relation between deep and shallow neural networks is extremely important for the theoretical study of deep learning. In this work, we discover an embedding principle in depth that loss landscape of an NN "contains" all critical points of the loss landscapes for shallower NNs. The key tool for our discovery is the critical lifting operator proposed in this work that maps any critical point of a network to critical manifolds of any deeper network while preserving the outputs. This principle provides new insights to many widely observed behaviors of DNNs. Regarding the easy training of deep networks, we show that local minimum of an NN can be lifted to strict saddle points of a deeper NN. Regarding the acceleration effect of batch normalization, we demonstrate that batch normalization helps avoid the critical manifolds lifted from shallower NNs by suppressing layer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications

MethodsBatch Normalization