Deep Network Trainability via Persistent Subspace Orthogonality
Alex Massucco, Davide Murari, Carola-Bibiane Sch\"onlieb

TL;DR
This paper introduces a new architectural approach that maintains gradient stability in deep neural networks by controlling Jacobian properties, enabling effective training of very deep models.
Contribution
It proposes the concept of persistent subspace orthogonality and practical methods to enforce it, improving deep network trainability beyond existing architectures.
Findings
Enforcing Jacobian orthogonality improves gradient preservation.
The proposed methods enable training of deeper networks.
Empirical results validate the theoretical benefits.
Abstract
Training neural networks via backpropagation is often hindered by vanishing or exploding gradients. In this work, we design architectures that mitigate these issues by analyzing and controlling the network Jacobian. We first provide a unified characterization for a class of networks with orthogonal Jacobian including known architectures and yielding new trainable designs. We then introduce the relaxed notion of persistent subspace orthogonality. This applies to a broader class of networks whose Jacobians are isometries only on a non-trivial subspace. We propose practical mechanisms to enforce this condition and empirically show that it is necessary to sufficiently preserve the gradient norms during backpropagation, enabling the training of very deep networks. We support our theory with extensive experiments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
