Deep orthogonal linear networks are shallow

Pierre Ablin

arXiv:2011.13831·stat.ML·November 30, 2020

Deep orthogonal linear networks are shallow

Pierre Ablin

PDF

Open Access

TL;DR

This paper demonstrates that training deep orthogonal linear networks with Riemannian gradient descent is equivalent to training a shallow one-layer network, showing no overparameterization or implicit bias effects in this setting.

Contribution

It establishes the equivalence between training deep orthogonal linear networks and shallow networks, revealing the absence of overparameterization effects.

Findings

01

Training deep orthogonal linear networks is equivalent to training shallow networks.

02

No overparameterization effects are observed in this setting.

03

Implicit bias does not influence training outcomes for these networks.

Abstract

We consider the problem of training a deep orthogonal linear network, which consists of a product of orthogonal matrices, with no non-linearity in-between. We show that training the weights with Riemannian gradient descent is equivalent to training the whole factorization by gradient descent. This means that there is no effect of overparametrization and implicit bias at all in this setting: training such a deep, overparametrized, network is perfectly equivalent to training a one-layer shallow network.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Face and Expression Recognition · Sparse and Compressive Sensing Techniques