Redundancy in Deep Linear Neural Networks

Oriel BenShmuel

arXiv:2206.04490·cs.LG·June 10, 2022

Redundancy in Deep Linear Neural Networks

Oriel BenShmuel

PDF

Open Access

TL;DR

This paper reveals that training deep linear neural networks with standard optimizers is effectively convex, challenging the belief that depth inherently provides expressiveness and optimization benefits.

Contribution

It provides a new conceptual understanding showing that deep linear networks are convex in training, contrary to common assumptions about their advantages.

Findings

01

Training deep linear networks is convex with standard optimizers.

02

Deep linear networks do not necessarily outperform single-layer linear models.

03

The work offers insights into the limitations of deep architectures in linear settings.

Abstract

Conventional wisdom states that deep linear neural networks benefit from expressiveness and optimization advantages over a single linear layer. This paper suggests that, in practice, the training process of deep linear fully-connected networks using conventional optimizers is convex in the same manner as a single linear fully-connected layer. This paper aims to explain this claim and demonstrate it. Even though convolutional networks are not aligned with this description, this work aims to attain a new conceptual understanding of fully-connected linear networks that might shed light on the possible constraints of convolutional settings and non-linear architectures.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Machine Learning and ELM · Stochastic Gradient Optimization Techniques