Cost-Efficient LLM Training with Lifetime-Aware Tensor Offloading via GPUDirect Storage
Ziqi Yuan, Haoyang Zhang, Yirui Eric Zhou, Apoorve Mohan, I-Hsin Chung, Seetharami Seelam, Jian Huang

TL;DR
This paper introduces TERAIO, a lifetime-aware tensor offloading framework that efficiently manages GPU memory for large language model training by offloading inactive tensors to SSDs, significantly improving training speed.
Contribution
The paper presents a novel lifetime-aware tensor offloading framework, TERAIO, which optimizes tensor migration using GPU memory profiling and GPUDirect storage to enhance LLM training performance.
Findings
TERAIO improves LLM training performance by 1.47x on average.
It achieves 80.7% of ideal performance with unlimited GPU memory.
The framework effectively offloads inactive tensors to SSDs without stalling training.
Abstract
We present the design and implementation of a new lifetime-aware tensor offloading framework for GPU memory expansion using low-cost PCIe-based solid-state drives (SSDs). Our framework, TERAIO, is developed explicitly for large language model (LLM) training with multiple GPUs and multiple SSDs. Its design is driven by our observation that the active tensors take only a small fraction (1.7% on average) of allocated GPU memory in each LLM training iteration, the inactive tensors are usually large and will not be used for a long period of time, creating ample opportunities for offloading/prefetching tensors to/from slow SSDs without stalling the GPU training process. TERAIO accurately estimates the lifetime (active period of time in GPU memory) of each tensor with the profiling of the first few iterations in the training process. With the tensor lifetime analysis, TERAIO will generate an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Cloud Computing and Resource Management
Methods1x1 Convolution · Non Maximum Suppression · Self-Supervised Deep Supervision · ZeRO-Infinity · Convolution · ZeRO-Offload · SSD
