Peering Beyond the Gradient Veil with Distributed Auto Differentiation

Bradley T. Baker; Aashis Khanal; Vince D. Calhoun; Barak Pearlmutter,; Sergey M. Plis

arXiv:2102.09631·cs.LG·February 4, 2022

Peering Beyond the Gradient Veil with Distributed Auto Differentiation

Bradley T. Baker, Aashis Khanal, Vince D. Calhoun, Barak Pearlmutter,, Sergey M. Plis

PDF

Open Access

TL;DR

This paper introduces distributed auto-differentiation (dAD), a novel communication-efficient method for training distributed deep neural networks by exploiting the outer-product structure of gradients, reducing communication overhead.

Contribution

The paper presents dAD, a new distributed training algorithm that leverages gradient structure for improved communication efficiency over traditional gradient-sharing methods.

Findings

01

dAD trains more efficiently than state-of-the-art methods on transformers.

02

dAD reduces communication overhead in distributed deep learning.

03

dAD is effective on large-scale text and imaging datasets.

Abstract

Although distributed machine learning has opened up many new and exciting research frontiers, fragmentation of models and data across different machines, nodes, and sites still results in considerable communication overhead, impeding reliable training in real-world contexts. The focus on gradients as the primary shared statistic during training has spawned a number of intuitive algorithms for distributed deep learning; however, gradient-centric training of large deep neural networks (DNNs) tends to be communication-heavy, often requiring additional adaptations such as sparsity constraints, compression, quantization, and more, to curtail bandwidth. We introduce an innovative, communication-friendly approach for training distributed DNNs, which capitalizes on the outer-product structure of the gradient as revealed by the mechanics of auto-differentiation. The exposed structure of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis