Train Small, Infer Large: Memory-Efficient LoRA Training for Large   Language Models

Jun Zhang; Jue Wang; Huan Li; Lidan Shou; Ke Chen; Yang You; Guiming; Xie; Xuejian Gong; Kunlong Zhou

arXiv:2502.13533·cs.LG·March 18, 2025

Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models

Jun Zhang, Jue Wang, Huan Li, Lidan Shou, Ke Chen, Yang You, Guiming, Xie, Xuejian Gong, Kunlong Zhou

PDF

Open Access 1 Repo

TL;DR

LoRAM introduces a memory-efficient training scheme for large language models by training on pruned, low-rank matrices, significantly reducing memory requirements while maintaining strong performance across tasks.

Contribution

The paper proposes LoRAM, a novel method that trains on pruned models to obtain low-rank matrices, enabling large LLM fine-tuning with substantially reduced memory footprint.

Findings

01

LoRAM reduces memory usage by over 15 times compared to full fine-tuning.

02

It enables training of 70B parameter models on a single 20G GPU.

03

LoRAM achieves performance gains over baseline models and standard LoRA.

Abstract

Large Language Models (LLMs) have significantly advanced natural language processing with exceptional task generalization capabilities. Low-Rank Adaption (LoRA) offers a cost-effective fine-tuning solution, freezing the original model parameters and training only lightweight, low-rank adapter matrices. However, the memory footprint of LoRA is largely dominated by the original model parameters. To mitigate this, we propose LoRAM, a memory-efficient LoRA training scheme founded on the intuition that many neurons in over-parameterized LLMs have low training utility but are essential for inference. LoRAM presents a unique twist: it trains on a pruned (small) model to obtain pruned low-rank matrices, which are then recovered and utilized with the original (large) model for inference. Additionally, minimal-cost continual pre-training, performed by the model publishers in advance, aligns the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

junzhang-zj/LoRAM
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsAdapter · Pruning