Training invariances and the low-rank phenomenon: beyond linear networks
Thien Le, Stefanie Jegelka

TL;DR
This paper extends the understanding of training invariances and low-rank phenomena from linear networks to nonlinear ReLU networks, revealing conditions under which weights converge to low-rank structures.
Contribution
It generalizes previous linear network results to nonlinear ReLU networks, identifying local invariances and conditions for low-rank convergence in deep learning models.
Findings
Weights in certain submatrices converge to low-rank structures.
Local invariances hold for neurons with stable activation patterns.
Full matrix invariance does not generally hold for ReLU networks.
Abstract
The implicit bias induced by the training of neural networks has become a topic of rigorous study. In the limit of gradient flow and gradient descent with appropriate step size, it has been shown that when one trains a deep linear network with logistic or exponential loss on linearly separable data, the weights converge to rank-1 matrices. In this paper, we extend this theoretical result to the last few linear layers of the much wider class of nonlinear ReLU-activated feedforward networks containing fully-connected layers and skip connections. Similar to the linear case, the proof relies on specific local training invariances, sometimes referred to as alignment, which we show to hold for submatrices where neurons are stably-activated in all training examples, and it reflects empirical results in the literature. We also show this is not true in general for the full matrix of ReLU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Advanced Neuroimaging Techniques and Applications
