Beyond SGD, Without SVD: Proximal Subspace Iteration LoRA with Diagonal Fractional K-FAC

Abdulla Jasem Almansoori; Maria Ivanova; Andrey Veprikov; Aleksandr Beznosikov; Samuel Horv\'ath; Martin Tak\'a\v{c}

arXiv:2602.16456·cs.LG·February 19, 2026

Beyond SGD, Without SVD: Proximal Subspace Iteration LoRA with Diagonal Fractional K-FAC

Abdulla Jasem Almansoori, Maria Ivanova, Andrey Veprikov, Aleksandr Beznosikov, Samuel Horv\'ath, Martin Tak\'a\v{c}

PDF

Open Access

TL;DR

This paper introduces LoRSum, a memory-efficient method for low-rank model adaptation that improves upon LoRA by using proximal sub-problems and structured preconditioning, achieving better performance with less memory.

Contribution

We propose LoRSum, a novel proximal subroutine for low-rank adaptation that unifies several preconditioning methods and enhances LoRA with memory-efficient structured metrics.

Findings

01

LoRSum matches or outperforms LoRA on various tasks.

02

Structured metrics like K-FAC improve efficiency while maintaining performance.

03

The method reduces memory usage compared to full SVD projections.

Abstract

Low-Rank Adaptation (LoRA) fine-tunes large models by learning low-rank updates on top of frozen weights, dramatically reducing trainable parameters and memory. In this work, we address the gap between training with full steps with low-rank projections (SVDLoRA) and LoRA fine-tuning. We propose LoRSum, a memory-efficient subroutine that closes this gap for gradient descent by casting LoRA optimization as a proximal sub-problem and solving it efficiently with alternating least squares updates, which we prove to be an implicit block power method. We recover several recently proposed preconditioning methods for LoRA as special cases, and show that LoRSum can also be used for updating a low-rank momentum. In order to address full steps with preconditioned gradient descent, we propose a scaled variant of LoRSum that uses structured metrics such as K-FAC and Shampoo, and we show that storing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Tensor decomposition and applications · Model Reduction and Neural Networks