Exact Solutions of a Deep Linear Network
Liu Ziyin, Botao Li, Xiangming Meng

TL;DR
This paper derives the exact global minima of deep linear networks with weight decay and stochastic neurons, revealing the complex landscape and the impact of weight decay on optimization challenges.
Contribution
It provides analytical solutions for the minima of deep linear networks, highlighting how weight decay influences the loss landscape and initialization effectiveness.
Findings
The origin is a special point where nonlinear phenomena emerge.
Weight decay can create bad minima at zero in networks with multiple hidden layers.
Common initialization methods may be insufficient for effective optimization.
Abstract
This work finds the analytical expression of the global minima of a deep linear network with weight decay and stochastic neurons, a fundamental model for understanding the landscape of neural networks. Our result implies that the origin is a special point in deep neural network loss landscape where highly nonlinear phenomenon emerges. We show that weight decay strongly interacts with the model architecture and can create bad minima at zero in a network with more than hidden layer, qualitatively different from a network with only hidden layer. Practically, our result implies that common deep learning initialization methods are insufficient to ease the optimization of neural networks in general.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Markov Chains and Monte Carlo Methods
MethodsWeight Decay
