# Practical, Linear-time, Fully Distributed Algorithms for Irregular   Gather and Scatter

**Authors:** Jesper Larsson Tr\"aff

arXiv: 1702.05967 · 2017-04-13

## TL;DR

This paper introduces simple, fully distributed algorithms for irregular gather and scatter operations with linear communication cost, improving efficiency and performance in MPI implementations on InfiniBand clusters.

## Contribution

The paper presents novel linear-time distributed algorithms for irregular gather and scatter, with practical implementation and benchmarking in MPI, outperforming standard fixed-tree methods.

## Key findings

- Algorithms achieve linear communication cost with low latency.
- Prototype MPI implementations demonstrate significant performance improvements.
- New performance guidelines effectively evaluate irregular collective operations.

## Abstract

We present new, simple, fully distributed, practical algorithms with linear time communication cost for irregular gather and scatter operations in which processors contribute or consume possibly different amounts of data. In a linear cost transmission model with start-up latency $\alpha$ and cost per unit $\beta$, the new algorithms take time $3|{\log_2 p}|\alpha+\beta \sum_{i\neq r}m_i$ where $p$ is the number of processors, $m_i$ the amount of data for processor $i, 0\leq i<p$, and processor $r, 0\leq r<p$ a root processor determined by the algorithm. For a fixed, externally given root processor $r$, there is an additive penalty of at most $\beta(M_{d'}-m_{r_{d'}}-\sum_{0\leq j<d'}M_j)$ time steps where each $M_j$ is the total amount of data in a tree of $2^j$ different processors with roots $r_j$ as constructed by the algorithm. The worst-case penalty is less than $\beta \sum_{i\neq r}m_i$ time steps. The algorithms have attractive properties for implementing the operations for MPI (the Message-Passing Interface). Standard algorithms using fixed trees take time either $|{\log_2 p}|(\alpha+\beta \sum_{i\neq r} m_i)$ in the worst case, or $\sum_{i\neq r}(\alpha+\beta m_i)$. We have used the new algorithms to give prototype implementations for the MPI_Gatherv and MPI_Scatterv collectives of MPI, and present benchmark results from a small and a medium-large InfiniBand cluster. In order to structure the experimental evaluation we formulate new performance guidelines for irregular collectives that can be used to assess the performance in relation to the corresponding regular collectives. We show that the new algorithms can fulfill these performance expectations with a large margin, and that standard implementations do not.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1702.05967/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1702.05967/full.md

## References

16 references — full list in the complete paper: https://tomesphere.com/paper/1702.05967/full.md

---
Source: https://tomesphere.com/paper/1702.05967