Data-parallel distributed training of very large models beyond GPU   capacity

Samuel Matzek; Max Grossman; Minsik Cho; Anar Yusifov; Bryant Nelson,; Amit Juneja

arXiv:1811.12174·cs.DC·November 30, 2018·1 cites

Data-parallel distributed training of very large models beyond GPU capacity

Samuel Matzek, Max Grossman, Minsik Cho, Anar Yusifov, Bryant Nelson,, Amit Juneja

PDF

Open Access

TL;DR

This paper presents Large Model Support (LMS), a tool that enables training of very large deep learning models beyond GPU memory limits by utilizing CPU-GPU memory swapping and high-bandwidth connections, combined with distributed data-parallel training.

Contribution

The paper introduces LMS, an open source tool that leverages high-bandwidth NVLink and CPU memory to train large models, enabling data-parallel training across multiple GPUs.

Findings

01

LMS effectively swaps tensors between CPU and GPU memory.

02

Combining LMS with MPI allows training of larger models across multiple GPUs.

03

High bandwidth CPU-GPU links are crucial for efficient large model training.

Abstract

GPUs have limited memory and it is difficult to train wide and/or deep models that cause the training process to go out of memory. It is shown in this paper how an open source tool called Large Model Support (LMS) can utilize a high bandwidth NVLink connection between CPUs and GPUs to accomplish training of deep convolutional networks. LMS performs tensor swapping between CPU memory and GPU memory such that only a minimal number of tensors required in a training step are kept in the GPU memory. It is also shown how LMS can be combined with an MPI based distributed deep learning module to train models in a data-parallel fashion across multiple GPUs, such that each GPU is utilizing the CPU memory for tensor swapping. The hardware architecture that enables the high bandwidth GPU link with the CPU is discussed as well as the associated set of software tools that are available as the PowerAI…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Neural Network Applications · Parallel Computing and Optimization Techniques