Redundancy in Deep Linear Neural Networks
Oriel BenShmuel

TL;DR
This paper reveals that training deep linear neural networks with standard optimizers is effectively convex, challenging the belief that depth inherently provides expressiveness and optimization benefits.
Contribution
It provides a new conceptual understanding showing that deep linear networks are convex in training, contrary to common assumptions about their advantages.
Findings
Training deep linear networks is convex with standard optimizers.
Deep linear networks do not necessarily outperform single-layer linear models.
The work offers insights into the limitations of deep architectures in linear settings.
Abstract
Conventional wisdom states that deep linear neural networks benefit from expressiveness and optimization advantages over a single linear layer. This paper suggests that, in practice, the training process of deep linear fully-connected networks using conventional optimizers is convex in the same manner as a single linear fully-connected layer. This paper aims to explain this claim and demonstrate it. Even though convolutional networks are not aligned with this description, this work aims to attain a new conceptual understanding of fully-connected linear networks that might shed light on the possible constraints of convolutional settings and non-linear architectures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and ELM · Stochastic Gradient Optimization Techniques
