BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks
Amrutha Varshini Ramesh, Vignesh Ganapathiraman, Issam H. Laradji,, Mark Schmidt

TL;DR
BlockLLM introduces a memory-efficient method for large language model adaptation by selecting and optimizing a small subset of parameters, achieving state-of-the-art results with significantly reduced memory usage.
Contribution
It proposes a novel block coordinate descent-based approach that selectively updates a small subset of parameters without altering model architecture or training procedures.
Findings
Achieves state-of-the-art perplexity on GLUE benchmarks with less than 5% parameter updates.
Reduces memory footprint significantly during training of large models.
Maintains competitive performance on pretrained Llama models with reduced memory requirements.
Abstract
Training large language models (LLMs) for pretraining or adapting to new tasks and domains has become increasingly critical as their applications expand. However, as the model and the data sizes grow, the training process presents significant memory challenges, often requiring a prohibitive amount of GPU memory that may not be readily available. Existing methods such as low-rank adaptation (LoRA) add trainable low-rank matrix factorizations, altering the training dynamics and limiting the model's parameter search to a low-rank subspace. GaLore, a more recent method, employs Gradient Low-Rank Projection to reduce the memory footprint, in the full parameter training setting. However GaLore can only be applied to a subset of the LLM layers that satisfy the "reversibility" property, thus limiting their applicability. In response to these challenges, we introduce BlockLLM, an approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Advanced Data Storage Technologies
MethodsLLaMA
