MemAscend: System Memory Optimization for SSD-Offloaded LLM Fine-Tuning
Yong-Cheng Liaw, Shuo-Han Chen

TL;DR
MemAscend is a system that optimizes memory management for SSD-offloaded large language model fine-tuning, significantly reducing memory usage and enabling larger models on limited hardware.
Contribution
It introduces a comprehensive framework that addresses system memory fragmentation and overhead issues in SSD offloading for LLM training, improving scalability and cost-efficiency.
Findings
Reduces peak system-memory consumption by 55.7% on average
Enables training of larger models and longer contexts
Improves scalability and reduces hardware costs
Abstract
Owing to the huge success of generative artificial intelligence (AI), large language models (LLMs) have emerged as a core subclass, underpinning applications such as question answering, text generation, and code completion. While fine-tuning these models on domain-specific data can yield significant performance gains, it also poses daunting computational challenges, especially for researchers and small organizations with limited hardware resources. Although SSD offloading (i.e., ZeRO-Infinity) has emerged as a viable strategy to overcome the GPU memory barrier via leveraging both system memory (i.e., CPU DRAM) and storage space (i.e., solid-state devices, SSDs), its design primarily targets model-centric performance issues. As a result, key system-level issues, including system memory fragmentation, inefficient pinned buffer allocation, peak CPU usage spikes, and file system overhead,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Advancements in Photolithography Techniques · VLSI and Analog Circuit Testing
