Efficient Embedding of MPI Collectives in MXNET DAGs for scaling Deep   Learning

Amith R Mamidala

arXiv:1802.06949·cs.DC·February 21, 2018

Efficient Embedding of MPI Collectives in MXNET DAGs for scaling Deep Learning

Amith R Mamidala

PDF

Open Access

TL;DR

This paper presents efficient methods for integrating MPI collective operations into MXNET's DAG-based deep learning framework, enabling scalable training on large GPU clusters with minimal epoch times.

Contribution

It introduces three novel MPI collective embedding designs for MXNET DAGs that enable overlap of communication and computation, improving scalability and performance.

Findings

01

Scales to 256 GPUs with 50-second epoch times on ImageNet.

02

Demonstrates overlap of communication and computation in DAG execution.

03

Achieves efficient distributed training on large GPU clusters.

Abstract

Availability of high performance computing infrastructures such as clusters of GPUs and CPUs have fueled the growth of distributed learning systems. Deep Learning frameworks express neural nets as DAGs and execute these DAGs on computation resources such as GPUs. In this paper, we propose efficient designs of embedding MPI collective operations into data parallel DAGs. Incorrect designs can easily lead to deadlocks or program crashes. In particular, we demonstrate three designs: Funneled, Concurrent communication and Dependency chaining of using MPI collectives with DAGs. These designs automatically enable overlap of computation with communication by allowing for concurrent execution with the other tasks. We directly implement these designs into the KVStore API of the MXNET. This allows us to directly leverage the rest of the infrastructure. Using ImageNet and CIFAR data sets, we show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Stochastic Gradient Optimization Techniques