A Doubly-pipelined, Dual-root Reduction-to-all Algorithm and   Implementation

Jesper Larsson Tr\"aff

arXiv:2109.12626·cs.DC·January 21, 2022

A Doubly-pipelined, Dual-root Reduction-to-all Algorithm and Implementation

Jesper Larsson Tr\"aff

PDF

Open Access

TL;DR

This paper introduces a novel, doubly-pipelined, dual-root reduction algorithm for MPI_Allreduce that exploits bidirectional communication, improving performance on parallel systems by optimizing communication steps and pipeline block size.

Contribution

The paper presents a new binary tree-based, doubly-pipelined allreduce algorithm with dual roots, enhancing efficiency by leveraging bidirectional communication capabilities.

Findings

01

Achieves lower latency with optimal pipeline block size.

02

Outperforms traditional reduce-broadcast and native MPI_Allreduce.

03

Effective on small, modern processor clusters.

Abstract

We discuss a simple, binary tree-based algorithm for the collective allreduce (reduction-to-all, MPI_Allreduce) operation for parallel systems consisting of $p$ suitably interconnected processors. The algorithm can be doubly pipelined to exploit bidirectional (telephone-like) communication capabilities of the communication system. In order to make the algorithm more symmetric, the processors are organized into two rooted trees with communication between the two roots. For each pipeline block, each non-leaf processor takes three communication steps, consisting in receiving and sending from and to the two children, and sending and receiving to and from the root. In a round-based, uniform, linear-cost communication model in which simultaneously sending and receiving $n$ data elements takes time $α + β n$ for system dependent constants $α$ (communication start-up latency) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInterconnection Networks and Systems · Parallel Computing and Optimization Techniques · Advanced Data Storage Technologies