Communication Round and Computation Efficient Exclusive Prefix-Sums Algorithms (for MPI_Exscan)
Jesper Larsson Tr\"aff

TL;DR
This paper introduces a new, simple algorithm for exclusive prefix sums in parallel systems that reduces communication rounds compared to traditional methods, with practical MPI implementation and performance comparison.
Contribution
A novel algorithm for exclusive prefix sums that minimizes communication rounds and is simpler than existing solutions, improving efficiency in MPI systems.
Findings
The new algorithm computes exclusive prefix sums in fewer communication rounds.
MPI implementation shows potential improvements over native MPI extunderscore Exscan.
Performance is dominated by communication rounds for small input vectors.
Abstract
Parallel scan primitives compute element-wise inclusive or exclusive prefix sums of input vectors contributed by consecutively ranked processors under an associative, binary operator . In message-passing systems with bounded, one-ported communication capabilities, at least or communication rounds are required to perform the scans. While there are well-known, simple algorithms for the inclusive scan that solve the problem in communication rounds with applications of (which could be expensive), the exclusive scan appears more difficult. Conventionally, the problem is solved with either communication rounds (e.g., by shifting the input vectors), or in communication rounds with applications of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
