GMLake: Efficient and Transparent GPU Memory Defragmentation for   Large-scale DNN Training with Virtual Memory Stitching

Cong Guo; Rui Zhang; Jiale Xu; Jingwen Leng; Zihan Liu; Ziyu Huang,; Minyi Guo; Hao Wu; Shouren Zhao; Junping Zhao; Ke Zhang

arXiv:2401.08156·cs.DC·January 17, 2024·1 cites

GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching

Cong Guo, Rui Zhang, Jiale Xu, Jingwen Leng, Zihan Liu, Ziyu Huang,, Minyi Guo, Hao Wu, Shouren Zhao, Junping Zhao, Ke Zhang

PDF

Open Access 1 Repo

TL;DR

GMLake introduces a virtual memory stitching framework that significantly reduces GPU memory usage and fragmentation during large-scale DNN training, improving efficiency and transparency without modifying existing models.

Contribution

GMLake presents a novel GPU memory management framework using virtual memory stitching to mitigate fragmentation and enhance memory utilization in large-scale DNN training.

Findings

01

Reduces GPU memory usage by up to 25 GB.

02

Decreases memory fragmentation by up to 33%.

03

Ensures seamless, transparent operation for DNN frameworks.

Abstract

Large-scale deep neural networks (DNNs), such as large language models (LLMs), have revolutionized the artificial intelligence (AI) field and become increasingly popular. However, training or fine-tuning such models requires substantial computational power and resources, where the memory capacity of a single acceleration device like a GPU is one of the most important bottlenecks. Owing to the prohibitively large overhead (e.g., $10 \times$ ) of GPUs' native memory allocator, DNN frameworks like PyTorch and TensorFlow adopt a caching allocator that maintains a memory pool with a splitting mechanism for fast memory (de)allocation. Unfortunately, the caching allocator's efficiency degrades quickly for popular memory reduction techniques such as recomputation, offloading, distributed training, and low-rank adaptation. The primary reason is that those memory reduction techniques introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

intelligent-machine-learning/glake
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Stochastic Gradient Optimization Techniques