Lotus: Efficient LLM Training by Randomized Low-Rank Gradient Projection with Adaptive Subspace Switching

Tianhao Miao; Zhongyuan Bao; Lejun Zhang

arXiv:2602.01233·cs.LG·February 3, 2026

Lotus: Efficient LLM Training by Randomized Low-Rank Gradient Projection with Adaptive Subspace Switching

Tianhao Miao, Zhongyuan Bao, Lejun Zhang

PDF

Open Access

TL;DR

Lotus is a novel training method for large-scale language models that reduces training time and memory usage by adaptively switching gradient subspaces, outperforming existing low-rank gradient projection techniques.

Contribution

It introduces an adaptive subspace switching criterion that improves training efficiency and performance in large-scale models compared to prior low-rank gradient methods.

Findings

01

30% reduction in training time

02

40% decrease in memory consumption

03

Outperforms baseline in pre-training and fine-tuning

Abstract

Training efficiency in large-scale models is typically assessed through memory consumption, training time, and model performance. Current methods often exhibit trade-offs among these metrics, as optimizing one generally degrades at least one of the others. Addressing this trade-off remains a central challenge in algorithm design. While GaLore enables memory-efficient training by updating gradients in a low-rank subspace, it incurs a comparable extra training time cost due to the Singular Value Decomposition(SVD) process on gradients. In this paper, we propose Lotus, a method that resolves this trade-off by simply modifying the projection process. We propose a criterion that quantifies the displacement of the unit gradient to enable efficient transitions between low-rank gradient subspaces. Experimental results indicate that Lotus is the most efficient method, achieving a 30% reduction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and ELM