STAlloc: Enhancing Memory Efficiency in Large-Scale Model Training with Spatio-Temporal Planning
Zixiao Huang, Junhao Hu, Hao Lin, Chunyang Zhu, Yueran Tang, Quanlu Zhang, Zhen Guo, Zhenhua Li, Shengen Yan, Zhenhua Zhu, Guohao Dai, Yu Wang

TL;DR
STAlloc is a novel GPU memory allocator that significantly reduces memory fragmentation in large-scale model training by combining offline planning with online allocation, leading to improved efficiency and throughput.
Contribution
It introduces a spatio-temporal planning-based memory allocator for deep learning frameworks that minimizes fragmentation and enhances training performance.
Findings
Reduces memory fragmentation ratio by up to 100%.
Improves training throughput by up to 32.5%.
Effective for both dense and Mixture-of-Experts models.
Abstract
The rapid scaling of large language models (LLMs) has significantly increased GPU memory pressure, which is further aggravated by training optimization techniques such as virtual pipeline and recomputation that disrupt tensor lifespans and introduce considerable memory fragmentation. Such fragmentation stems from the use of online GPU memory allocators in popular deep learning frameworks like PyTorch, which disregard tensor lifespans. As a result, this inefficiency can waste as much as 43% of memory and trigger out-of-memory errors, undermining the effectiveness of optimization methods. To address this, we introduce STAlloc, a GPU memory allocator for deep learning frameworks that reduces fragmentation by exploiting the spatial and temporal regularity in memory allocation behaviors of training workloads. STAlloc introduces a novel paradigm that combines offline planning with online…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
