Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models
Jun Zhang, Jue Wang, Huan Li, Lidan Shou, Ke Chen, Yang You, Guiming, Xie, Xuejian Gong, Kunlong Zhou

TL;DR
LoRAM introduces a memory-efficient training scheme for large language models by training on pruned, low-rank matrices, significantly reducing memory requirements while maintaining strong performance across tasks.
Contribution
The paper proposes LoRAM, a novel method that trains on pruned models to obtain low-rank matrices, enabling large LLM fine-tuning with substantially reduced memory footprint.
Findings
LoRAM reduces memory usage by over 15 times compared to full fine-tuning.
It enables training of 70B parameter models on a single 20G GPU.
LoRAM achieves performance gains over baseline models and standard LoRA.
Abstract
Large Language Models (LLMs) have significantly advanced natural language processing with exceptional task generalization capabilities. Low-Rank Adaption (LoRA) offers a cost-effective fine-tuning solution, freezing the original model parameters and training only lightweight, low-rank adapter matrices. However, the memory footprint of LoRA is largely dominated by the original model parameters. To mitigate this, we propose LoRAM, a memory-efficient LoRA training scheme founded on the intuition that many neurons in over-parameterized LLMs have low training utility but are essential for inference. LoRAM presents a unique twist: it trains on a pruned (small) model to obtain pruned low-rank matrices, which are then recovered and utilized with the original (large) model for inference. Additionally, minimal-cost continual pre-training, performed by the model publishers in advance, aligns the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsAdapter · Pruning
