Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse   Gradients

Aashiq Muhamed; Oscar Li; David Woodruff; Mona Diab; Virginia Smith

arXiv:2406.17660·cs.LG·June 26, 2024

Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients

Aashiq Muhamed, Oscar Li, David Woodruff, Mona Diab, Virginia Smith

PDF

Open Access 1 Repo 1 Video

TL;DR

Grass introduces a structured sparse gradient projection method that reduces memory and computational costs in large language model training, enabling efficient pretraining of large models on limited hardware with significant throughput gains.

Contribution

The paper presents Grass, a novel sparse projection technique that improves memory efficiency and training throughput for large language models, outperforming dense and existing projection methods.

Findings

01

Enables pretraining of 13B LLaMA on a single 40GB GPU

02

Achieves up to 2x throughput improvement on 8-GPU systems

03

Maintains competitive performance with full-rank training

Abstract

Large language model (LLM) training and finetuning are often bottlenecked by limited GPU memory. While existing projection-based optimization methods address this by projecting gradients into a lower-dimensional subspace to reduce optimizer state memory, they typically rely on dense projection matrices, which can introduce computational and memory overheads. In this work, we propose Grass (GRAdient Stuctured Sparsification), a novel approach that leverages sparse projections to transform gradients into structured sparse updates. This design not only significantly reduces memory usage for optimizer states but also minimizes gradient memory footprint, computation, and communication costs, leading to substantial throughput improvements. Extensive experiments on pretraining and finetuning tasks demonstrate that Grass achieves competitive performance to full-rank training and existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aashiqmuhamed/grass
noneOfficial

Videos

GRASS: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients· underline

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning and Algorithms · Neural Networks and Applications

MethodsLLaMA