Exact Solutions of a Deep Linear Network

Liu Ziyin; Botao Li; Xiangming Meng

arXiv:2202.04777·stat.ML·June 14, 2023·1 cites

Exact Solutions of a Deep Linear Network

Liu Ziyin, Botao Li, Xiangming Meng

PDF

Open Access

TL;DR

This paper derives the exact global minima of deep linear networks with weight decay and stochastic neurons, revealing the complex landscape and the impact of weight decay on optimization challenges.

Contribution

It provides analytical solutions for the minima of deep linear networks, highlighting how weight decay influences the loss landscape and initialization effectiveness.

Findings

01

The origin is a special point where nonlinear phenomena emerge.

02

Weight decay can create bad minima at zero in networks with multiple hidden layers.

03

Common initialization methods may be insufficient for effective optimization.

Abstract

This work finds the analytical expression of the global minima of a deep linear network with weight decay and stochastic neurons, a fundamental model for understanding the landscape of neural networks. Our result implies that the origin is a special point in deep neural network loss landscape where highly nonlinear phenomenon emerges. We show that weight decay strongly interacts with the model architecture and can create bad minima at zero in a network with more than $1$ hidden layer, qualitatively different from a network with only $1$ hidden layer. Practically, our result implies that common deep learning initialization methods are insufficient to ease the optimization of neural networks in general.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Markov Chains and Monte Carlo Methods

MethodsWeight Decay