InRank: Incremental Low-Rank Learning
Jiawei Zhao, Yifei Zhang, Beidi Chen, Florian Sch\"afer, Anima, Anandkumar

TL;DR
This paper introduces InRank, a new training algorithm that incrementally increases low-rank representations of neural network weights, improving efficiency while maintaining performance, based on theoretical insights into low-rank learning.
Contribution
It removes the impractical initialization assumption in low-rank learning theory and develops InRank, an algorithm that explicitly enforces low-rank weight updates during training.
Findings
InRank achieves comparable accuracy to full-rank models with significantly lower rank.
InRank reduces training time by up to 37% and model size by 36%.
Theoretical results hold across various neural network architectures and training algorithms.
Abstract
The theory of greedy low-rank learning (GLRL) aims to explain the impressive generalization capabilities of deep learning. It proves that stochastic gradient-based training implicitly regularizes neural networks towards low-rank solutions through a gradual increase of the rank during training. However, there is a gap between theory and practice since GLRL requires an infinitesimal initialization of the weights, which is not practical due to the fact that it is a saddle point. In this work, we remove the assumption of infinitesimal initialization by focusing on cumulative weight updates. We prove the cumulative weight updates follow an incremental low-rank trajectory for arbitrary orthogonal initialization of weights in a three-layer linear network. Empirically, we demonstrate that our theory holds on a broad range of neural networks (e.g., transformers) and standard training algorithms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Attention Dropout · Dense Connections · Cosine Annealing · Linear Layer · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · Layer Normalization · Multi-Head Attention
