GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching
Cong Guo, Rui Zhang, Jiale Xu, Jingwen Leng, Zihan Liu, Ziyu Huang,, Minyi Guo, Hao Wu, Shouren Zhao, Junping Zhao, Ke Zhang

TL;DR
GMLake introduces a virtual memory stitching framework that significantly reduces GPU memory usage and fragmentation during large-scale DNN training, improving efficiency and transparency without modifying existing models.
Contribution
GMLake presents a novel GPU memory management framework using virtual memory stitching to mitigate fragmentation and enhance memory utilization in large-scale DNN training.
Findings
Reduces GPU memory usage by up to 25 GB.
Decreases memory fragmentation by up to 33%.
Ensures seamless, transparent operation for DNN frameworks.
Abstract
Large-scale deep neural networks (DNNs), such as large language models (LLMs), have revolutionized the artificial intelligence (AI) field and become increasingly popular. However, training or fine-tuning such models requires substantial computational power and resources, where the memory capacity of a single acceleration device like a GPU is one of the most important bottlenecks. Owing to the prohibitively large overhead (e.g., ) of GPUs' native memory allocator, DNN frameworks like PyTorch and TensorFlow adopt a caching allocator that maintains a memory pool with a splitting mechanism for fast memory (de)allocation. Unfortunately, the caching allocator's efficiency degrades quickly for popular memory reduction techniques such as recomputation, offloading, distributed training, and low-rank adaptation. The primary reason is that those memory reduction techniques introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Stochastic Gradient Optimization Techniques
