MLorc: Momentum Low-rank Compression for Memory Efficient Large Language Model Adaptation

Wei Shen; Zhang Yaxiang; Minhui Huang; Mengfan Xu; Jiawei Zhang; Cong Shen

arXiv:2506.01897·cs.LG·April 28, 2026

MLorc: Momentum Low-rank Compression for Memory Efficient Large Language Model Adaptation

Wei Shen, Zhang Yaxiang, Minhui Huang, Mengfan Xu, Jiawei Zhang, Cong Shen

PDF

TL;DR

MLorc introduces a memory-efficient training method for large language models by compressing momentum, enabling full-parameter learning with reduced memory use and improved performance over existing methods.

Contribution

The paper proposes MLorc, a novel momentum low-rank compression technique that preserves training dynamics and enhances memory efficiency in LLM fine-tuning.

Findings

01

MLorc outperforms other memory-efficient methods in experiments.

02

It matches or exceeds full fine-tuning performance at small ranks.

03

MLorc generalizes well across different optimizers.

Abstract

With increasing size of large language models (LLMs), full-parameter fine-tuning imposes substantial memory demands. To alleviate this, we propose a novel memory-efficient training paradigm called Momentum Low-rank compression (MLorc). The key idea of MLorc is to compress and reconstruct the momentum of matrix parameters during training to reduce memory consumption. Compared to LoRA, MLorc avoids enforcing a fixed-rank constraint on weight update matrices and thus enables full-parameter learning. Compared to GaLore, MLorc directly compress the momentum rather than gradients, thereby better preserving the training dynamics of full-parameter fine-tuning. We provide a theoretical guarantee for its convergence under mild assumptions. Empirically, MLorc consistently outperforms other memory-efficient training methods, matches or even exceeds the performance of full fine-tuning at small ranks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.