Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients
Aashiq Muhamed, Oscar Li, David Woodruff, Mona Diab, Virginia Smith

TL;DR
Grass introduces a structured sparse gradient projection method that reduces memory and computational costs in large language model training, enabling efficient pretraining of large models on limited hardware with significant throughput gains.
Contribution
The paper presents Grass, a novel sparse projection technique that improves memory efficiency and training throughput for large language models, outperforming dense and existing projection methods.
Findings
Enables pretraining of 13B LLaMA on a single 40GB GPU
Achieves up to 2x throughput improvement on 8-GPU systems
Maintains competitive performance with full-rank training
Abstract
Large language model (LLM) training and finetuning are often bottlenecked by limited GPU memory. While existing projection-based optimization methods address this by projecting gradients into a lower-dimensional subspace to reduce optimizer state memory, they typically rely on dense projection matrices, which can introduce computational and memory overheads. In this work, we propose Grass (GRAdient Stuctured Sparsification), a novel approach that leverages sparse projections to transform gradients into structured sparse updates. This design not only significantly reduces memory usage for optimizer states but also minimizes gradient memory footprint, computation, and communication costs, leading to substantial throughput improvements. Extensive experiments on pretraining and finetuning tasks demonstrate that Grass achieves competitive performance to full-rank training and existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and Algorithms · Neural Networks and Applications
MethodsLLaMA
