Practical offloading for fine-tuning LLM on commodity GPU via learned sparse projectors
Siyuan Chen, Zhuofeng Wang, Zelong Guan, Yudong Liu, Phillip B., Gibbons

TL;DR
This paper introduces LSP-Offload, a framework that enables efficient fine-tuning of large language models on commodity GPUs by using learned sparse projectors to reduce communication overhead and improve parallelism.
Contribution
The paper presents a novel offloading framework with learned sparse compressors and a layer-wise communication schedule for near-native speed LLM fine-tuning on commodity hardware.
Findings
Enables fine-tuning of 1.3B parameter models on 4GB GPU
Reduces fine-tuning time by up to 62.5% compared to state-of-the-art
Achieves near-native speed performance with minimal accuracy loss
Abstract
Fine-tuning large language models (LLMs) requires significant memory, often exceeding the capacity of a single GPU. A common solution to this memory challenge is offloading compute and data from the GPU to the CPU. However, this approach is hampered by the limited bandwidth of commodity hardware, which constrains communication between the CPU and GPU, and by slower matrix multiplications on the CPU. In this paper, we present an offloading framework, LSP-Offload, that enables near-native speed LLM fine-tuning on commodity hardware through learned sparse projectors. Our data-driven approach involves learning efficient sparse compressors that minimize communication with minimal precision loss. Additionally, we introduce a novel layer-wise communication schedule to maximize parallelism between communication and computation. As a result, our framework can fine-tune a 1.3 billion parameter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModular Robots and Swarm Intelligence · Advancements in Photolithography Techniques · Magnetic properties of thin films
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
