Deep orthogonal linear networks are shallow
Pierre Ablin

TL;DR
This paper demonstrates that training deep orthogonal linear networks with Riemannian gradient descent is equivalent to training a shallow one-layer network, showing no overparameterization or implicit bias effects in this setting.
Contribution
It establishes the equivalence between training deep orthogonal linear networks and shallow networks, revealing the absence of overparameterization effects.
Findings
Training deep orthogonal linear networks is equivalent to training shallow networks.
No overparameterization effects are observed in this setting.
Implicit bias does not influence training outcomes for these networks.
Abstract
We consider the problem of training a deep orthogonal linear network, which consists of a product of orthogonal matrices, with no non-linearity in-between. We show that training the weights with Riemannian gradient descent is equivalent to training the whole factorization by gradient descent. This means that there is no effect of overparametrization and implicit bias at all in this setting: training such a deep, overparametrized, network is perfectly equivalent to training a one-layer shallow network.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Face and Expression Recognition · Sparse and Compressive Sensing Techniques
