InRank: Incremental Low-Rank Learning

Jiawei Zhao; Yifei Zhang; Beidi Chen; Florian Sch\"afer; Anima; Anandkumar

arXiv:2306.11250·cs.LG·January 2, 2024·1 cites

InRank: Incremental Low-Rank Learning

Jiawei Zhao, Yifei Zhang, Beidi Chen, Florian Sch\"afer, Anima, Anandkumar

PDF

Open Access 1 Repo

TL;DR

This paper introduces InRank, a new training algorithm that incrementally increases low-rank representations of neural network weights, improving efficiency while maintaining performance, based on theoretical insights into low-rank learning.

Contribution

It removes the impractical initialization assumption in low-rank learning theory and develops InRank, an algorithm that explicitly enforces low-rank weight updates during training.

Findings

01

InRank achieves comparable accuracy to full-rank models with significantly lower rank.

02

InRank reduces training time by up to 37% and model size by 36%.

03

Theoretical results hold across various neural network architectures and training algorithms.

Abstract

The theory of greedy low-rank learning (GLRL) aims to explain the impressive generalization capabilities of deep learning. It proves that stochastic gradient-based training implicitly regularizes neural networks towards low-rank solutions through a gradual increase of the rank during training. However, there is a gap between theory and practice since GLRL requires an infinitesimal initialization of the weights, which is not practical due to the fact that it is a saddle point. In this work, we remove the assumption of infinitesimal initialization by focusing on cumulative weight updates. We prove the cumulative weight updates follow an incremental low-rank trajectory for arbitrary orthogonal initialization of weights in a three-layer linear network. Empirically, we demonstrate that our theory holds on a broad range of neural networks (e.g., transformers) and standard training algorithms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiaweizzhao/inrank
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Attention Dropout · Dense Connections · Cosine Annealing · Linear Layer · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · Layer Normalization · Multi-Head Attention