Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking

Athanasios Glentis; Jiaxiang Li; Qiulin Shang; Andi Han; Ioannis Tsaknakis; Quan Wei; Mingyi Hong

arXiv:2505.22922·cs.LG·May 30, 2025

Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking

Athanasios Glentis, Jiaxiang Li, Qiulin Shang, Andi Han, Ioannis Tsaknakis, Quan Wei, Mingyi Hong

PDF

Open Access 1 Repo

TL;DR

This paper reviews and benchmarks recent algorithmic advances in memory and parameter-efficient pretraining of large language models, proposing techniques to improve performance while reducing resource requirements.

Contribution

It provides a comprehensive survey, benchmarking of methods, and introduces two techniques—weight refactorization and momentum reset—to enhance efficient pretraining.

Findings

01

Full-rank training yields best performance with proper hyperparameters.

02

High-rank updates improve low-rank approaches.

03

Proposed techniques reduce memory usage and improve perplexity on 1B models.

Abstract

Fueled by their remarkable ability to tackle diverse tasks across multiple domains, large language models (LLMs) have grown at an unprecedented rate, with some recent models containing trillions of parameters. This growth is accompanied by substantial computational challenges, particularly regarding the memory and compute resources required for training and fine-tuning. Numerous approaches have been explored to address these issues, such as LoRA. While these methods are effective for fine-tuning, their application to pre-training is significantly more challenging due to the need to learn vast datasets. Motivated by this issue, we aim to address the following questions: Can parameter- or memory-efficient methods enhance pre-training efficiency while achieving performance comparable to full-model training? How can the performance gap be narrowed? To this end, the contributions of this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

optimai-lab/memory_efficient_pretraining
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning