Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients
Yezhen Wang, Zhouhao Yang, Brian K Chen, Fanyi Pu, Bo Li, Tianyu Gao,, Kenji Kawaguchi

TL;DR
This paper introduces VLoRP, a flexible low-rank gradient projection framework that improves memory efficiency and stability in large language model fine-tuning, supported by theoretical analysis and extensive experiments.
Contribution
VLoRP extends low-rank gradient projection by controlling projection granularity, offering a new trade-off mechanism; ProjFactor optimizes memory use during training; and the paper provides convergence analysis.
Findings
Finer-grained projections improve stability and efficiency.
VLoRP achieves competitive performance with reduced memory.
Theoretical guarantees for convergence under SGD and ProjFactor.
Abstract
Building upon the success of low-rank adapter (LoRA), low-rank gradient projection (LoRP) has emerged as a promising solution for memory-efficient fine-tuning. However, existing LoRP methods typically treat each row of the gradient matrix as the default projection unit, leaving the role of projection granularity underexplored. In this work, we propose a novel framework, VLoRP, that extends low-rank gradient projection by introducing an additional degree of freedom for controlling the trade-off between memory efficiency and performance, beyond the rank hyper-parameter. Through this framework, we systematically explore the impact of projection granularity, demonstrating that finer-grained projections lead to enhanced stability and efficiency even under a fixed memory budget. Regarding the optimization for VLoRP, we present ProjFactor, an adaptive memory-efficient optimizer, that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Surface Polishing Techniques · Advanced Neural Network Applications · Advancements in Photolithography Techniques
MethodsStochastic Gradient Descent · Adapter
