Memory-efficient array redistribution through portable collective   communication

Norman A. Rink; Adam Paszke; Dimitrios Vytiniotis; Georg Stefan Schmid

arXiv:2112.01075·cs.DC·November 29, 2022·1 cites

Memory-efficient array redistribution through portable collective communication

Norman A. Rink, Adam Paszke, Dimitrios Vytiniotis, Georg Stefan Schmid

PDF

Open Access

TL;DR

This paper introduces a memory-efficient method for array redistribution in large-scale deep learning, using a formal approach to synthesize MPI-style collective operations that optimize data transfer and reduce bottlenecks.

Contribution

We propose a type-directed synthesis approach for array redistribution using collective operations, with formal guarantees of memory efficiency and no excessive data transfer, integrated into a production system.

Findings

01

Achieves a 1.22x average speedup over existing methods

02

Maximum speedup observed up to 5.7x

03

Provides provable memory guarantees for large-scale models

Abstract

Modern large-scale deep learning workloads highlight the need for parallel execution across many devices in order to fit model data into hardware accelerator memories. In these settings, array redistribution may be required during a computation, but can also become a bottleneck if not done efficiently. In this paper we address the problem of redistributing multi-dimensional array data in SPMD computations, the most prevalent form of parallelism in deep learning. We present a type-directed approach to synthesizing array redistributions as sequences of MPI-style collective operations. We prove formally that our synthesized redistributions are memory-efficient and perform no excessive data transfers. Array redistribution for SPMD computations using collective operations has also been implemented in the context of the XLA SPMD partitioner, a production-grade tool for partitioning programs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Stochastic Gradient Optimization Techniques · Advanced Data Storage Technologies