Practical offloading for fine-tuning LLM on commodity GPU via learned   sparse projectors

Siyuan Chen; Zhuofeng Wang; Zelong Guan; Yudong Liu; Phillip B.; Gibbons

arXiv:2406.10181·cs.DC·February 11, 2025

Practical offloading for fine-tuning LLM on commodity GPU via learned sparse projectors

Siyuan Chen, Zhuofeng Wang, Zelong Guan, Yudong Liu, Phillip B., Gibbons

PDF

Open Access 1 Repo

TL;DR

This paper introduces LSP-Offload, a framework that enables efficient fine-tuning of large language models on commodity GPUs by using learned sparse projectors to reduce communication overhead and improve parallelism.

Contribution

The paper presents a novel offloading framework with learned sparse compressors and a layer-wise communication schedule for near-native speed LLM fine-tuning on commodity hardware.

Findings

01

Enables fine-tuning of 1.3B parameter models on 4GB GPU

02

Reduces fine-tuning time by up to 62.5% compared to state-of-the-art

03

Achieves near-native speed performance with minimal accuracy loss

Abstract

Fine-tuning large language models (LLMs) requires significant memory, often exceeding the capacity of a single GPU. A common solution to this memory challenge is offloading compute and data from the GPU to the CPU. However, this approach is hampered by the limited bandwidth of commodity hardware, which constrains communication between the CPU and GPU, and by slower matrix multiplications on the CPU. In this paper, we present an offloading framework, LSP-Offload, that enables near-native speed LLM fine-tuning on commodity hardware through learned sparse projectors. Our data-driven approach involves learning efficient sparse compressors that minimize communication with minimal precision loss. Additionally, we introduce a novel layer-wise communication schedule to maximize parallelism between communication and computation. As a result, our framework can fine-tune a 1.3 billion parameter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gulang2019/lsp-offload
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModular Robots and Swarm Intelligence · Advancements in Photolithography Techniques · Magnetic properties of thin films

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings