Beyond SGD, Without SVD: Proximal Subspace Iteration LoRA with Diagonal Fractional K-FAC
Abdulla Jasem Almansoori, Maria Ivanova, Andrey Veprikov, Aleksandr Beznosikov, Samuel Horv\'ath, Martin Tak\'a\v{c}

TL;DR
This paper introduces LoRSum, a memory-efficient method for low-rank model adaptation that improves upon LoRA by using proximal sub-problems and structured preconditioning, achieving better performance with less memory.
Contribution
We propose LoRSum, a novel proximal subroutine for low-rank adaptation that unifies several preconditioning methods and enhances LoRA with memory-efficient structured metrics.
Findings
LoRSum matches or outperforms LoRA on various tasks.
Structured metrics like K-FAC improve efficiency while maintaining performance.
The method reduces memory usage compared to full SVD projections.
Abstract
Low-Rank Adaptation (LoRA) fine-tunes large models by learning low-rank updates on top of frozen weights, dramatically reducing trainable parameters and memory. In this work, we address the gap between training with full steps with low-rank projections (SVDLoRA) and LoRA fine-tuning. We propose LoRSum, a memory-efficient subroutine that closes this gap for gradient descent by casting LoRA optimization as a proximal sub-problem and solving it efficiently with alternating least squares updates, which we prove to be an implicit block power method. We recover several recently proposed preconditioning methods for LoRA as special cases, and show that LoRSum can also be used for updating a low-rank momentum. In order to address full steps with preconditioned gradient descent, we propose a scaled variant of LoRSum that uses structured metrics such as K-FAC and Shampoo, and we show that storing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Tensor decomposition and applications · Model Reduction and Neural Networks
