Benchmarking network fabrics for data distributed training of deep   neural networks

Siddharth Samsi; Andrew Prout; Michael Jones; Andrew Kirby; Bill; Arcand; Bill Bergeron; David Bestor; Chansup Byun; Vijay Gadepally; Michael; Houle; Matthew Hubbell; Anna Klein; Peter Michaleas; Lauren Milechin; Julie; Mullen; Antonio Rosa; Charles Yee; Albert Reuther; Jeremy Kepner

arXiv:2008.08057·cs.DC·September 8, 2021

Benchmarking network fabrics for data distributed training of deep neural networks

Siddharth Samsi, Andrew Prout, Michael Jones, Andrew Kirby, Bill, Arcand, Bill Bergeron, David Bestor, Chansup Byun, Vijay Gadepally, Michael, Houle, Matthew Hubbell, Anna Klein, Peter Michaleas, Lauren Milechin, Julie, Mullen, Antonio Rosa, Charles Yee, Albert Reuther

PDF

TL;DR

This paper evaluates how different network fabrics and communication primitives impact the training efficiency of deep neural networks in distributed data parallel settings, finding minimal effect of Ethernet-based networking on training times.

Contribution

It provides a comparative analysis of network interconnects and communication methods like GPUDirect and NCCL in distributed deep learning training.

Findings

01

Ethernet-based networking shows negligible impact on training times.

02

GPUDirect and NCCL improve communication efficiency.

03

Network fabric choice has limited effect on training performance.

Abstract

Artificial Intelligence/Machine Learning applications require the training of complex models on large amounts of labelled data. The large computational requirements for training deep models have necessitated the development of new methods for faster training. One such approach is the data parallel approach, where the training data is distributed across multiple compute nodes. This approach is simple to implement and supported by most of the commonly used machine learning frameworks. The data parallel approach leverages MPI for communicating gradients across all nodes. In this paper, we examine the effects of using different physical hardware interconnects and network-related software primitives for enabling data distributed deep learning. We compare the effect of using GPUDirect and NCCL on Ethernet and OmniPath fabrics. Our results show that using Ethernet-based networking in shared…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.