Efficient distributed algorithms for Convolutional Neural Networks

Rui Li; Yufan Xu; Aravind Sukumaran-Rajam; Atanas Rountev; P; Sadayappan

arXiv:2105.13480·cs.DC·July 29, 2021

Efficient distributed algorithms for Convolutional Neural Networks

Rui Li, Yufan Xu, Aravind Sukumaran-Rajam, Atanas Rountev, P, Sadayappan

PDF

TL;DR

This paper introduces communication-efficient distributed algorithms for CNNs, inspired by existing matrix multiplication algorithms, optimizing memory and data transfer in distributed computing environments.

Contribution

It generalizes matrix multiplication algorithms to CNN computations, providing new distributed-memory algorithms that improve communication efficiency.

Findings

01

Algorithms reduce inter-node communication volume

02

Memory requirements are optimized for distributed CNN training

03

Framework applicable to various CNN architectures

Abstract

Several efficient distributed algorithms have been developed for matrix-matrix multiplication: the 3D algorithm, the 2D SUMMA algorithm, and the 2.5D algorithm. Each of these algorithms was independently conceived and they trade-off memory needed per node and the inter-node data communication volume. The convolutional neural network (CNN) computation may be viewed as a generalization of matrix-multiplication combined with neighborhood stencil computations. We develop communication-efficient distributed-memory algorithms for CNNs that are analogous to the 2D/2.5D/3D algorithms for matrix-matrix multiplication.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.