Accelerating Neural Network Training with Distributed Asynchronous and   Selective Optimization (DASO)

Daniel Coquelin; Charlotte Debus; Markus G\"otz; Fabrice von der Lehr,; James Kahn; Martin Siggel; and Achim Streit

arXiv:2104.05588·cs.LG·February 9, 2022

Accelerating Neural Network Training with Distributed Asynchronous and Selective Optimization (DASO)

Daniel Coquelin, Charlotte Debus, Markus G\"otz, Fabrice von der Lehr,, James Kahn, Martin Siggel, and Achim Streit

PDF

TL;DR

This paper introduces DASO, a novel distributed asynchronous optimization method that reduces neural network training time by up to 34% using hierarchical communication and adaptive synchronization in multi-GPU clusters.

Contribution

DASO is a new distributed training approach that employs hierarchical asynchronous communication and adaptive synchronization to accelerate neural network training.

Findings

01

Achieves up to 34% reduction in training time.

02

Effectively utilizes multi-GPU architectures.

03

Outperforms existing data parallel methods.

Abstract

With increasing data and model complexities, the time required to train neural networks has become prohibitively large. To address the exponential rise in training time, users are turning to data parallel neural networks (DPNN) to utilize large-scale distributed resources on computer clusters. Current DPNN approaches implement the network parameter updates by synchronizing and averaging gradients across all processes with blocking communication operations. This synchronization is the central algorithmic bottleneck. To combat this, we introduce the Distributed Asynchronous and Selective Optimization (DASO) method which leverages multi-GPU compute node architectures to accelerate network training. DASO uses a hierarchical and asynchronous communication scheme comprised of node-local and global networks while adjusting the global synchronization rate during the learning process. We show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.